Multilevel Time Series Complexity

JOURNAL OF APPLIED
COMPUTER SCIENCE
Vol. 19 No. 2 (2011), pp. 61-71
Multilevel Time Series Complexity
Bohdan Kozarzewski
University of Information Technology and Management
Faculty of Applied Computer Science
H. Sucharskiego 2, 35-225 Rzeszów, Poland
[email protected]
Abstract. A simple and fast algorithm to quantify time series complexity
which follows newly developed nonparametric complexity measure of symbolic sequences is proposed. In order to get complexity measure over many
time scales I suggest using wavelets multilevel decomposition of time series
instead of coarse-graining. As an example multilevel complexity of series
generated by Henon map, as well as some data downloaded from PhysioBank database: synthetic series, gait dynamics, and interbeat heart rate
is calculated.
Keywords: Complexity, wavelets decomposition, biomedical signals..
1. Introduction
The notion of time series complexity still remains a little bit abstract. There is
no precise formal definition of time series complexity, it is only vaguely defined
and many alternatives have been proposed. There is no agreement how to quantify time series complexity as well. Mathematical definition of symbolic sequence
complexity according to Kolmogorov relies on information theory. As a measure
of complexity of symbolic sequence, the length of the shortest binary input to a
universal Turing machine that leads to the same sequence, is considered. Unfortunately the definition is no significant help for practical applications. On the other
62
Multilevel Time Series Complexity
hand complexity seems to be essential to understand the underlying mechanisms
behind complex systems.
For the practical use two main approaches to quantify time series complexity
are in use. The first one relies on information entropy as tool to define complexity.
The complexity measure is defined as the difference between two entropies; sum
of local entropies and global entropy [1] or entropies of time series computed for
two subsets of different length [2]. See also [3] where modified version of complexity measure called approximate entropy [4] is used. The second approach is
close to Kolmogorov definition and measures complexity of symbolic sequence.
The algorithm proposed by Ke and Tong [5] consists of rules for parsing strings of
symbols from a finite alphabet. The algorithm proposed by them is the significant
modification of the algorithm of Lempel-Ziv complexity [6]. Authors call their
measure the lattice complexity.
In the present paper I focus my attention on the complexity of symbolic binary sequences, as they can represent to some extend systems, whose complexity
I would like to estimate. The paper is organized as follows: in Sec.2 I analyze lattice complexity and compare this measures of complexity with sample entropy for
series generated by Henon map and some data taken from PhysioBank database
[7]. In Sec. 3 I turn to the complexity measure across multiple temporal scales
and suggest multilevel wavelets decomposition of the signal instead of multiscale
method that has been developed in [2].
2. Lattice complexity
Among motivation for using the techniques of symbolic dynamics in the study
of dynamical systems is that symbolic dynamics is the unique rigorous treatment
for chaotic systems. In the following I will use the simplest symbolic representation of the system by partitioning state space into two sets and labeling each
element of this partition by ”0” or ”1”. In searching for an adequate measure for
the complexity of binary strings, two limiting cases must be considered: the regular strings (such as a periodic sequences) and the random ones. A good measure
of physical complexity is expected to yield a vanishing complexity for both cases,
while the other strings that appear to encode a lot of information are thought to be
complex.
The nonparametric measure proposed in [5] satisfies above mentioned conditions. In my opinion it is the best time series complexity measure for the time
being. The algorithm they proposed consists of a rules for parsing sequence of
B. Kozarzewski
63
symbols from a finite alphabet into a specified subsequences (they call them lattices). A lattice is a subsequence with the following properties: it includes an iterative sequence as its prefix, it remembers the history of the sequence and can repeat
any series of successive operations in the memory, the last symbol of a lattice must
be inserted into lattice unless the end of series is reached. In Appendix one can find
listing of corresponding 3-step procedure in Scilab [8] code how to extract lattice
from any sequence.
The measure of lattice complexity of symbol sequence is simply the number of
lattices in the sequence. We make a minor changes in original definition; in order to
have lowest complexity value equal to zero we deducted 1 from the original measure and to avoid large numbers we divided result by n log2 n, where n is length of
the series. As a result the lattice complexity measure I will use in the present paper
is restricted to the range between null and approximately one. Complexity analysis
helps to detect whether there is any mechanism, or some dynamical system behind
the output time series and is essential to understand the underlying mechanisms
behind dynamical systems. Complexity analysis of biological time series (recording gait dynamics or heart interbeat intervals, for example) appears to be useful
in discriminating whether they were from healthy persons or patients with some
disease.
To get some insight into lattice complexity I compare both measures described,
i.e. the sample entropy (SE) and the lattice complexity (Lc) for some series. SE
is a measure that quantifies the unpredictability in a time series data. It reflects
the likelihood that similar observations will not be followed by additional similar conditions. Let (x1 , ..., xN ) represents a time series of length N and um (i) =
(xi , xi+1 , ..., xi+m−1 ) be a vector of size m. Let nim (r)) be a number of vectors um ( j))
within distance r from vector um (i)). The distances among vectors are calculated
as the maximum absolute distance between their corresponding components. If I
define
N−m
X
1
nlm (r)
m
Φ (r) =
ln
(1)
N − m i=1,(i, j) N − m
then approximate entropy for finite time series is given by
S E(m, r, N) = Φm (r) − Φm+1 (r).
(2)
Sample entropy is function of parameters m and r, it weakly depends on time
series length N when N exceeds approximately 103 .
64
Multilevel Time Series Complexity
Figure 1. Complexity measure of the Henon map generated series for b = 0.3. Left
- sample entropy, right - lattice complexity.
As a time series I use output of chaotic Henon map which is considered as the
two-dimensional but can be rewritten as
2
xi = bxi−2 − axi−1
+ 1.
(3)
This map reduces to the logistic map when b = 0, becomes conservative when
b = 1, and is dissipative map in between the two cases. I focus my analyze on
parameter b = 0.3 where complexity of the system significantly exceedes zero. I
allow a parameter vary within the range 1 ≤ a ≤ 1.42 where series is periodic,
chaotic and diverges in different subranges. For a parameter within mentioned
range with increment 0.005 I generate series of 215 elements and calculate for
each series its sample entropy (parameters like in [3] are m = 1 and r = 0.15) and
lattice complexity as well.
One can see that complexity is close to zero with small chaotic component in
the parameter range where series is mostly periodic, equals zero above approximately a = 1.42. In the parameter range where complexity has larger values series
exhibits more or less complex dynamics. There is qualitative similarity in overall
behavior between sample entropy and lattice complexity, however there are substantial differences in details.
B. Kozarzewski
65
3. Multiscale (multiresolution) lattice complexity
Above discussed measures of time series complexity only quantify the degree
of complexity on a single time scale. However very often output of dynamical
system has a complicated temporal structure on different scales. An example are
outputs of multiple physiologic control mechanisms. In particular, heart rate variability is the output that operate on a wide range of time scales. Costa & al [2]
have proposed a method to calculate multiscale entropy (MSE) from complex signals. For time series (x1 , ..., xN ) they construct coarse-grained set of time series y(l)
by averaging l data points xi , xi+1, ..., xi+l ) in consecutive nonoverlaping windows.
Number l is called a scale of coarse-graining procedure. The length of coarsegrained series decreases with the scale as N/l which results in increasing error in
sample entropy for higher scales. Sample entropy indeed depends on scale factor
l, and what’s more, character of SE(l) function depends on exponent of power-law
correlations of time series. Scale dependent sample entropy allows to discriminate,
for example, between output signals generated be the dynamical system in the different environments or between healthy persons and patients with some health failures. As we will see so does the multilevel lattice complexity, however there are
significant differences between the two approaches to the scale dependent complexity. To explain them I consider some examples.
3.1. Surrogate time series
One can assume that the simplest way to get scale dependent lattice complexity
is to adopt coarse-grained procedure. To test the hypothesis I selected four surrogate time series dowloaded from the public domain archives PhysioBank [7]. They
belong to the synthetic data category and are characterized by different exponent α.
Signals under interest have symbols 0117, 0517, 0917 and 1517, where ”01” means
α = 0.1 and ”17” means signal length 217 and so on. The α exponent measures
the degree of correlations in the signal F(n) v nα , where F is root mean square
fluctuation function and n is the box length. For more details see [9]. Precisely, I
restricted myself to shorter signals of length 214 and coarse graining scales from 0
to 10, scale 0 corresponds to original, no coarse-grained signal. Results shown in
left plot of Fig.2 are a bit surprising; within numerical error the lattice complexity
does not depend on coarse-graining scale. It means that algorithm of time series
coarse-graining fails to quantify lattice complexity across multiple scales.
There are at least two methods to develop scale dependent lattice complexity.
66
Multilevel Time Series Complexity
The first one consists of wavelets decomposition of the signal and use approximate wavelet coefficients instead of coarse-grained series. The second one could
be fine-graining procedure to convert digital series into symbolic one with the use
of refining alphabet of long enough size. That, however leads to significant increase in computational complexity.
Here I restrict ourselves to the former method and the simplest alphabet of
size 2. Details of the method are as follows. Wavelet decomposition of numerical
series on some level l is a structure [Al , Dl ] that contains decomposition vector
Al and bookkeeping vector Dl . The decomposition vector Al is composed of approximate coefficients on level l and detail coefficients on levels 1,2,...,l. There are
evidences that series of decomposition vectors are the same complexity as original
time series for corresponding level. Performing wavelet decomposition of a signal I collect decomposition vectors at levels from 1 to some reasonable maximum
level lmax which depends on length of time series and particular wavelets used. In
the next step relative differences of consecutive elements of decomposition vectors
(and original series considered as level l = 0) are calculated
r(i) = (A(i) − A(i − 1))/A(i − 1),
(4)
and then transformed into binary symbolic sequence according to the following
rule
(
0 if r(i) < 0
r(i) −→
(5)
1 otherwise.
In that way I get a lmax +1 of symbolic series (each of same length as the
signal itself) representing original time series and decomposition vectors at all
levels considered. The set of lattice complexities Lc(l) for all levels I call multilevel lattice complexity. Multilevel lattice complexity depends to some extend on
wavelets family used and very weakly on time series length. In the following the
Haar wavelets family will be used. Now I am going to analyze some examples
which show potential ability of multilevel complexity to discriminate time series
generated by particular dynamics.
At the beginning I turn again to the surrogate time series discussed earlier.
I perform multilevel decomposition of the time series (instead of coarse-graining)
and calculate lattice complexity according to the algorithm described above. Wavelets decomposition clearly discriminates between series of different correlation and
can be used as a tool for extraction time series structures in multiple scales.
B. Kozarzewski
67
Figure 2. Lattice complexity of synthetic series over a range of scales. Left coarse-grained scaling, right -multilevel wavelets decomposition
3.2. Gait dynamics
Now I test discrimination ability of multilevel complexity measure on data
containing stride intervals. Human gait is a one of complex mechanisms for the
interaction of the human body with the environment. It is known that there is a
random variation in the stride interval of humans during walking and that variability exhibits long-time correlations. The fractal and multifractal properties of the
stride interval time series were studied, using among others, the distribution of the
local Hölder exponents [10]. They established that the stride interval time series
is more complex than a monofractal phenomenon and that a slightly multifractal
and non-stationary time series under different gait conditions emerges. Besides,
many diseases affect gait cycle duration and general gait dynamics. Better understanding of gait dynamics may be useful as a diagnostic and prognostic utility for
therapeutic intervention. I focus my attention on the time series of the interstride
intervals between successive strides in human gait and restrict ourselves to study
multilevel complexity of stride interval of young healthy individuals under different circumstances. Precisely, the data I selected are sets of the stride interval
time series for 5 healthy young men walking at fast paces in both free and metronomically triggered conditions. Relevant data were taken from PhysioBank signal
archives - Gait databases (5 longest files from the set si01.fast to si10.fast and from
*.metfast ). Fig. 3 shows averaged results for both fast an metronomic fast walking.
68
Multilevel Time Series Complexity
Figure 3. Lattice complexity distribution of spontaneous fast and metronomic fast
walking
Again wavelets decomposition clearly discriminates between stride interval time
series in both different conditions in lower levels. Result suggests that output of
walking at fast paces in free conditions is more complex at medium scales and is
indistinguishable from metronomically triggered walking output at long scales.
3.3. Human heart interbeat rate
Heart rate variability is among relatively simple methods for the studies of
physiologic mechanisms responsible for the control of heart rate fluctuations, in
which the autonomic nervous system appears to play a primary role. Heart rate
variability typically shows a complex behaviour which is believed to reflect the
complexity of a central physiologic control system.Variability of complex behaviour
has been observed in many clinical states, autonomic neuropathy, heart transplantation, congestive heart failure, and other cardiac and non-cardiac diseases. Heart
rate variability depends on age as well as on behavioral states of individuals, e.g.
B. Kozarzewski
69
Figure 4. Lattice complexity distribution of heart interbeat rate for young and elderly individuals
usual daytime activity and sleep at night. In the present example I analyze multilevel complexity to answer question if there is any characteristic difference in
the scaling behavior between heart dynamics of young and elderly individuals.
I used the interbeat heart rate records from mini collection of PhysioBank signal archives called Fantasia Database Subset. This collection consists of 10 heart
beat time series (about 5000 elements long) from two groups of healthy man, five
young (average about 26 years) and five elderly (average about 74 years). Fig.
4 shows averaged result of multilevel lattice complexity analysis for each of the
group.
The lattice complexity and wavelets decomposition clearly discriminates between interbeat rate time series in both group of individuals in all scales. It
is interesting to note that lattice complexity in elderly group is higher at all levels
in contrast to the entropy results as stated in [3] that there is a loss of complexity
of disease and aging. The topic needs future investigation.
70
Multilevel Time Series Complexity
4. Conclusions
The main objective of the present paper was to test the ability of lattice complexity to distinguish effectively on many time scales between signals generated
by healthy individuals in different conditions or by healthy individuals and those
diseased in body or mind. All examples considered show that lattice complexity is
able to deal with the job. There is hope that multilevel lattice complexity of output
of dynamical system may allow to learn more about underlying mechanism and be
of valuable practical importance.
References
[1] Rajkovič, M., Entropic nonextensivity as a measure of time series complexity,
[on-line]. Access:ArXiv:nlin/00404019v1, 2004, /[2010-12-01].
[2] Costa, M., Peng, C.-K., Goldberg, A. L., and Hausdorff, J. M., Multiscale
entropy analysis of human gait dynamics, Physica A, Vol. 330, 2003, pp. 53–
60.
[3] Costa, M., Goldberg, A. L., and Peng, C.-K., Multiscale entropy analysis of
biological signals, Physical Review E, Vol. 71, No. 2, 2005, pp. 021906–1–
021906–17.
[4] Pincus, S. M., Assessing serial irregularity and its implications for health,
Annales N Y Acad. Sci., 2001, pp. 245–267.
[5] Ke, D.-G. and Tong, Q.-Y., Easily adaptable complexity measure for finite time series, Physical Review E, Vol. 77, No. 5, 2008, pp. 066215–1 –
066215–8.
[6] Lempel, A. and Ziv, J., On the complexity of finite sequences, IEEE Trans.
Inform. Theor., Vol. IT-22, 1976, pp. 75–81.
[7] PhysioBank, Tech. rep., PhysioBank Archive Index, [on-line]. Access:
http://wwww.physionet.org, /[2010-12-01].
[8] Scilab-5.0.3, Tech. rep., Consortium Scilab (INRIA, ENPC), [on-line]. Access: http://wwww.scilab.org, /[2010-12-01].
B. Kozarzewski
71
[9] Xu, L., Ivanov, P. C., Hu, K., Carbone, A., and Stanley, H. E., Quantifying signals with power-law correlations, Physical Review E, Vol. 71, No. 5,
2005, pp. 051101–1 – 051101–14.
[10] Scafetta, N., Griffin, L., and West, B. J., Holder exponent spectra for human
gait, Physica A, Vol. 328, 2003, pp. 561–583.
Appendix
function Lattice(y,x)
//y - part of series already partitioned into lattices,
//x the rest of the series
n=length(x);
R=strcat([R,x(i)]);
//1- - - - - k=strcmp(P,R);
P=x(1); i=2;
end;
k=strindex(P,x(i));
//3- - - - - while(isempty(k))
Q=strcat([Q,R]);
P=strcat([P,x(i)]);
P=Q(1:end-1);
if(i==n), La=P; return; end;
i=length(Q);
i=i+1;
S=strcat([y,P]);
k=strindex(P,x(i)); end;
k=strindex(S,Q);
Q=strcat([P,x(i)]);
while(~isempty(k))
//2- - - - - if(i==n), La=Q; return; end;
j=k;P=”;R=”;
i=i+1;
while(k)
Q=strcat([Q,x(i)]);
j=j+1; P=strcat([P,x(j)]);
k=strindex(S,Q);
if(i==n), La=P; return; end;
end;
i=i+1;
Lattice=Q;