Features Extraction and Reconstruction of Country

Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 31 (2014) 265 – 272
2nd International Conference on Information Technology and Quantitative Management, ITQM
2014
Features Extraction and Reconstruction of Country Risk
based on Empirical EMD
Xiaoyang Yaoa,b, Xiaolei Suna, Yuying Yanga,b, Dengsheng Wu a,*, Xun Liangc
a
Institute of Policy and Management, Chinese Academy of Sciences, Beijing 100190, P.R.China
b
University of Chinese Academy of Sciences,Beijing 100039, P.R.China
c
School of Information, Renmin University of China, Beijing 100872, China
Abstract
In the application of the Empirical Mode Decomposition (EMD), reconstruction to the intrinsic mode functions (IMFs) which are
obtained by EMD is necessary in order to simplify analysis and make reconstruction results of more economic explanatory power.
At present, there are two main reconstruction methods; one is based on the changing of data construction, represented by the fineto-coarse method, the other one takes the correlation of the IMFs into consideration, for example, calculating the correlation
between the marginal spectrums of different IMFs. In order to study the internal unity and differences between the two methods,
country risk data of the BRICS countries are selected to make the empirical analysis. The results are as follows. Firstly, it is not
reasonable that the residue obtained by the EMD is directly regarded as the trend of the original data. Secondly, by fine-to-coarse,
all the IMFs can be reconstructed to three time scales, which are denoted as high-frequency mode, low-frequency mode and trend
respectively, but explanation of these scales for the real situation is not satisfactory. At last, trend which is extracted based on the
correlation of the IMF marginal spectrums can describe the basic behavior of the original data. Contrasted to fine-to-coarse, the
results obtained by the second method are more reasonable.
©
by Elsevier
B.V.
access under CC BY-NC-ND license.
© 2014
2014 Published
The Authors.
Published
by Open
Elsevier
B.V.
Selection
and
peer-review
under
responsibility
of the Organizing Committee of ITQM 2014.
Selection and peer-review under responsibility of the Organizing Committee of ITQM 2014.
Keyword: EEMD, reconstruction, fine-to-coarse, correlation, Hilbert-Huang tranform, Hilbert marginal spectrums
* Corresponding author. Tel.: 86-10-59358813
E-mail address: [email protected]
1877-0509 © 2014 Published by Elsevier B.V. Open access under CC BY-NC-ND license.
Selection and peer-review under responsibility of the Organizing Committee of ITQM 2014.
doi:10.1016/j.procs.2014.05.268
266
Xiaoyang Yao et al. / Procedia Computer Science 31 (2014) 265 – 272
1. Introduction
Empirical Mode Decomposition (EMD) is a new time-frequency data analysis method proposed by Huang et al in
1998 1. It is proved versatile in the applications for extracting signals from data, including the nonlinear and nonstationary process. In general, EMD is intuitive, direct and adaptive. Later for the purpose of solving the modes
mixing problem, Huang et al 2 proposed an improved EMD method (Ensemble Empirical Mode Decomposition) by
introducing the white noise. Up to now, EMD is widely applied in many areas, such as mechanical fault diagnosis 3-5
and images feature identification 6-9. Furthermore, EMD and its thought of decomposition have also been used in
analyzing and forecasting of economic data 10-14. The essence of the method is to identify the intrinsic oscillatory
modes by their characteristic time scales in the data empirically, and then decompose the data accordingly. Each
derived intrinsic mode is dominated by scales in a narrow range. Thus, according to the scale, the concrete
implications of each mode can be identified.
Unlike the image and mechanical fields, in economic field, it is of great importance to identify internal periodic
variation rule and underlying trend of the original data precisely. However, because of the complexity and
mutability of economic data, analysing original data directly cannot get enough useful information to find the
internal changing characteristics. For example, some short-term fluctuation may cover up underlying law of data. As
a result, it is very difficult to grasp the economic law and identify the economic stages properly. Given this, some
researchers introduced EMD into the analysis and application of time series data, decomposing original data into
finite IMFs that reflect different scales information. The dilemma between difficulties in modelling and lack of
economic meaning can be solved by EMD then 15. However, in the practical application, it is of little necessity to
use IMFs obtained by EMD directly, because some IMFs indicate similar information and not each IMF can be
given corresponding economic meaning. Considering about this, reconstruction to IMFs is quite necessary, which
aims to enhance explanatory of the refactoring series and simplify the process. In current, there are two main
reconstruction methods. The first one is based on the changing of the data construction, represented by the fine-tocoarse method which has been applied in many researches. For example, Zhang X 15 decomposed crude oil price
through EEMD firstly in order to study the underlying characteristics of international crude oil price movements.
Then based on fine-to-coarse, IMFs except the residue are reconstructed into two time scales. One is the highfluctuating process, the other one is slow-varying mode. Residue is directly seen as the trend of original data.
Furthermore, combining the features of crude price oil, some analysis of three time scales including residue are
given respectively; Tang L et al 16 identified multi-scales characteristics of country risk data by introducing the
EMD and fine-to-coarse. Three time scales characteristics, namely, short-term, medium-term and long trend are
extracted, that deepen the cognition of the country risk variation; Li J P et al 17 forecasted country risk data by
decomposing the country risk data and reconstructing IMFs through fine-to-coarse based on the “decompose and
integrate” theory.
However, Yang et al 18 thought that, since the last IMF is always monotonic, fine-to-coarse fails to extract the
underlying trend of many signals such as those signals whose underlying trend are not monotonic. So he proposed
another trend extraction method based on the correlation of different IMFs. Firstly, convert original data from time
domain to frequency domain. Then make the Hilbert-Huang transform and calculate the marginal spectrums of each
IMF. At last, reconstruct IMFs according to the correlation between different IMFs.
Although the two methods mentioned above have been applied respectively, till to now, no researches have
studied the internal unity and differences between the two methods both in theory and empirical experiments. So in
this paper, the two reconstruction methods are analysed so as to find the feasibility, advantages and limits of each
method. Experiments are also carried out to give empirical results.
The rest of this paper is organized as follows. In section 2, models used in this paper are briefly introduced. In
section 3, using ICRG country risk data of BRICS countries, empirical analysis is operated. At last, conclusions are
given in section 4.
2. Models
In this paper, original data are firstly decomposed into finite IMFs by EEMD. Then IMFs are reconstructed
respectively by fine-to-coarse and correlation method which is based on correlation of the IMF marginal spectrums.
Xiaoyang Yao et al. / Procedia Computer Science 31 (2014) 265 – 272
267
EEMD is an improved EMD method, which consists of sifting an ensemble of white noise-added signal and treats
the mean as the final true result. Like EMD, original data x(t ) can be also represented as x (t ) ¦ ci (t ) rn through
EEMD. A complete sifting process stops when the residue rn , becomes a monotonic function from which no more
IMF can be extracted and the detail can be found in Huang et al 1.
2.1. fine-to-coarse reconstruction
Fine-to-coarse is a reconstructed method which is based on changing of data structure, the specific procedure are
as follows 15:
x Computing the mean of the sum of c1 (t ) to ci (t ) for each component (except for the residue);
x Using t-test to identify for which i the mean significantly departs from zero;
x Once i is identified as a significant change point, partial reconstruction with IMFs from this to the end, is
identified as the low-frequency mode and the partial reconstruction with other IMFs is identified as the highfrequency mode.
x Residue is directly regarded as the basic trend of the original data.
2.2. reconstruction based on Hilbert-Huang marginal spectrum 18
Theoretical basis of another reconstruction method is correlation between different IMFs. For the reason that
differences between correlation indices calculated directly by the IMFs data are not clear, correlation index in the
frequency domain instead of time domain is selected. It is well known that residue received after EEMD must
contain the tendency information of original data, but if last IMFs are located very close to each other in the
frequency domain, then they all should be remained to indicate basic tendency of original data. Based on this idea,
firstly Hilbert-Huang transform is carried on original data. Then marginal spectrum of each is calculated, at last
correlation index are obtained between adjacent IMFs.
Having decomposed original data into a finite number of IMFs, the instantaneous frequencies of the signal can be
calculated via the Hilbert transform 1, The Hilbert transform is initially applied to each IMF ci (t ) as follows:
zi (t )
ci (t ) j H [ci (t )] ai (t )e jTi (t )
where the Hilbert transform is defined as˖
H [ci (t )]
1
S
P³
f
f
ci (t ')
dt '
t t'
and P indicates the Cauchy principal value.
Clearly, ai (t ) , Ti (t ) are the instantaneous amplitude and phase of ci (t ) , respectively. So the instantaneous
frequency of ci (t ) is denoted as Zi (t ) which is defined as
Zi (t )
d T i (t )
dt
In order to represent the instantaneous amplitude and frequency of each IMF ci (t ) as functions of time in a three
dimensional space, time frequency representation of the amplitude is designated as the Hilbert spectrum of ci (t ) 1
denoted by H i Z , t , where H i Z , t =0 except H i Zi (t ), t ai (t ) . It gives the time frequency amplitude
distribution of ci (t ) .In practice, the frequency and the time are sampled as m'Z for m  Z , tn n't for n  Z ,
respectively. Hence, the Hilbert marginal spectrum of ci (t ) can be defined as:
268
Xiaoyang Yao et al. / Procedia Computer Science 31 (2014) 265 – 272
hi m'Z ¦ H m 'Z , t i
n
n
At last, the correlation coefficients of the Hilbert marginal spectrums of two consecutive IMFs are calculated as
follows:
i
Ui
¦ (h m'Z h )(h m'Z h )
i
i
i
m
for i 1,", N 1
¦ (hi m'Z hi )2 (hi m'Z hi )2
m
In the practical operation process, an optimal error tolerance H which is greater than zero need to be defined
firstly, then find the largest n0 which satisfies following inequality: U i t H , i t n0 .
3. Empirical Results
3.1. Data
International Country Risk Guide(ICRG) is one of the most famous country risk rating, which provides monthly
data for a large number of countries. Recently, many country risk researches which are based on risk rating from
ICRG rise 19. In this paper, country risk ratings by ICRG are also selected as empirical data, covering from February
1997 to July 2013, with a total of 198 observations. And the BRICS countries, which include Brazil, Russia, India,
China and South Africa are selected as sample countries. On one hand, the BRICS countries are all developing
countries, which means they are in similar developing stage. On the other hand, the five countries locate in different
continents, which means culture, politics and customs of these countries differ from each other greatly. In a word,
the BRICS countries not only have lots of common points, but also have their own characteristics respectively.
3.2. Reconstruction results of BRICS country risk data based on fine-to-coarse
Decompose country risk data of five BRICS countries by EEMD, then reconstruct IMFs into three scales, that are
high-frequency mode, low-frequency mode and residue using fine-to-coarse. Reconstruction results are depicted in
Fig.1, and it can be found that:
x Residue obtained directly by EEMD mainly proves to be monotonous and deviates from original data clearly. It
cannot capture the fluctuation of the data, so it is unreasonable to regard the residue as the trend of the original
data directly.
x Comparing the results of five countries, there are something noteworthy. For Brazil, India, Russia and South
Africa, “high-frequency & residue” can describe original country risk data more exactly than “low-frequency &
residue”. However, for China, “low-frequency & residue” behaves much better.
x For any IMFs, fine-to-coarse can reconstructed them into three time scales: low-frequency mode, high-frequency
mode and residue. However when extracting the underlying trend, the reconstruction of IMFs seems to be lacking
of enough stability and interpretation of reality.
269
Xiaoyang Yao et al. / Procedia Computer Science 31 (2014) 265 – 272
80
85
70
80
60
75
50
Feb/97
Feb/01
Apr/05
Jun/09
70
Feb/97
Jul/13
Feb/01
75
100
70
80
65
60
60
Feb/97
Feb/01
Apr/05
Apr/05
Jun/09
Jul/13
Jun/09
Jul/13
China
Brazil
Jun/09
40
Feb/97
Jul/13
Feb/01
India
Apr/05
Russia
80
original data
75
residue
high-frequency&residue
70
low-frequency&residue
65
Feb/97
Feb/01
Apr/05
Jun/09
Jul/13
South Africa
Fig.1Reconstruction results of BRICS country risk data based on fine-to-coarse
3.3. Reconstruction results of BRICS country risk data based on correlation of Hilbert marginal spectrum
Analyzing data in frequency domain instead of time domain can discover the internal features of data, such as
frequency and amplitude. After EEMD and Hilbert-Huang transform, marginal spectrums of adjacent IMFs are
calculated for each country in order to find the correlation between IMFs in frequency domain.
Extracting underlying trend of data is of great importance. Excluding some high-frequency incidents, internal
variation law of original data can be found by extracting the underlying trend. Further, according to the trend, future
tendency can be forecasted. However, residue is directly regarded as the underlying trend in fine-to-coarse. It is
unreasonable to remain only the residue but eliminate other similar IMFs as the representation of original data’s
tendency. Considering the situation, correlation index is introduced to measure the similarity degree of the last
several IMFs. If correlation between two adjacent IMFs is quite clear, they should both be remained. The correlation
index are revealed in Table 1. By performing intensive simulations on a wide varietyof signals, H 0.2 seems to be
a good choice ofmost signals 16. Taking Brazil for instance, U 4 0.2 means the correlation between IMF4 and IMF5
is not clear enough. U 5 ! 0.2, U 6 ! 0.2 indicates IMF5 ,IMF6, and IMF7 are closely related. So IMF5, IMF6 and
IMF7 should be retained to consitude the trend of original data.
Table 1 Correlation index between marginal spectrums of adjacent IMFs
Country
U1
U2
U3
U4
U5
U6
Brazil
0.1292
0.0323
0.0239
0.0130
0.0616
0.8386
China
0.2733
0.0769
0.0697
0.2682
0.4210
0.8390
India
0.1700
0.0357
0.6545
0.0001
0.0789
0.9843
Russia
0.1247
0.0633
0.0004
0.0131
0.5698
0.8297
South Africa
0.1384
0.0574
0.0062
0.2123
0.7630
0.4544
3.4. Comparison of results obtained by different methods†
†
In order to make the meanings clear,“Method One” refers to fine-to-coarse, “residue”, “high-frequency” and “low-frequency” are all obtained by
“Method One”. “Method Two” refers to trend extracting method based on the correlation between marginal spectrums of IMFs. “Trend” in this
270
Xiaoyang Yao et al. / Procedia Computer Science 31 (2014) 265 – 272
Comparing and contrasting results obtained by two methods mentioned above, correlation index between different
serieses and original country risk data are displayed in Table 2.
x For all the five countries, correlation index between residue and original country risk data are the smallest in four
series, which once again indicates the irrationality of selecting residue alone as the underlying trend of original
data.
x Trend extracted by “Method Two” take correlation between adjacent IMFs into consideration. Through Table
1and Fig.2, it can be found that trend extracted by “Method Two” performs better than residue alone in fitting
original data. Furthermore, correlation index between trend extracted by “Method Two” and original data fall in
between two correlation indices that describe correlation between “high-frequency& residue”, “low-frequency&
residue” and original data, respectively.
x Generally speaking, last several IMFs ,which mainly consist of slow-varying processes and residue, can represent
the underlying tendency of original data. Considering five countries, except for China, correlation index between
original data and “low-frequency & residue” are not better than those between “trend” and original data. For
Brazil, India, Russia and South Africa, “high-frequency & residue” has a better performance than “low-frequency
& residue” in matching original data. In a word, in extracting underlying tendency, “Method Two” has a more
stable and explanatory performance than “Method One”.
Table 2 Correlation index between original data and different series obtained by different methods
Country
residue
high-frequency
low-frequency
&residue
&residue
trend
Brazil
0.6259
0.9747
0.7404
0.7404
China
0.5243
0.5340
0.9999
0.9513
India
0.5305
0.9977
0.5930
0.5930
Russia
0.6841
0.9981
0.7353
0.8607
South Africa
0.2246
0.9782
0.5015
0.8526
80
85
75
80
70
65
75
60
55
Feb/97
Feb/01
Apr/05
Jun/09
Jul/13
70
Feb/97
Feb/01
Apr/05
Jun/09
Jul/13
Jun/09
Jul/13
China
Brazil
75
90
80
70
70
60
65
50
60
Feb/97
Feb/01
Apr/05
Jun/09
Jul/13
40
Feb/97
Feb/01
India
Apr/05
Russia
80
original data
75
trend
high-frequency&residue
70
low-frequency&residue
65
Feb/97
Feb/01
Apr/05
Jun/09
Jul/13
South Africa
Fig.2 Comparison of reconstructed results by two methods
What’s more, in order to study what leads to differences of the results, selecting processes of IMFs in
reconstruction are compared. Results are displayed in Table 3.
part means the extracted trend by “Method Two”.
271
Xiaoyang Yao et al. / Procedia Computer Science 31 (2014) 265 – 272
x Trend extracted by “Method Two” consists of “residue” and “low-frequency” obtained by fine-to-coarse,
including some “high-frequency” IMF, like Russia and South Africa. China is a special case, that only IMF1 is
identified as “high-frequency” but all other IMFs are identified as “low-frequency”. This can also explain why
correlation index between “low-frequency & residue” and original country risk data reach up to 99% in Table 2.
To some extent, reconstruction results by fine-to-coarse cannot be explained persuasively.
x Contrasting to fine-to-coarse, high-fluctuation process can be further identified when extracting trend using
“Method Two”. In “Method Two”, IMFs except extracted trend can be regarded as high-fluctuation process. So
for five countries, except for China, IMF1-5 are all identified as high-frequency series, and only IMF6 is thought
as low-frequency series in fine-to-coarse. However, if “Method Two” is applied, for Russia, only IMF1-4 are
identified as high-fluctuation process. For South Africa, only IMF1-3 are identified as high-fluctuation process.
Table 3 Process of selecting IMF in two different methods (numbers in table indicate the order of IMF)
Country
Method One
Method Two
high-frequency
low-frequency
residue
non-trend
Brazil
1,2,3,4,5
6
7
1,2,3,4,5
trend
6,7
China
1
2,3,4,5,6
7
1,2,3
4,5,6,7
India
1,2,3,4,5
6
7
1,2,3,4,5
6,7
Russia
1,2,3,4,5
6
7
1,2,3,4
5,6,7
South Africa
1,2,3,4,5
6
7
1,2,3
4,5,6,7
4. Conclusions
EMD method is widely applied in many fields because it is intuitive, direct and adaptive. In the application of the
EMD, it is necessary to reconstruct IMFs in order to simplify the analysis and give different time-scales series
economic meaning. There are two main reconstruction methods at present, one is based on the changing of the data
construction, represented by the fine-to-coarse method, the other one takes the correlation of the IMF into
consideration, for example, calculating the correlation between the marginal spectrums of different IMF. Empirical
results in this paper are as follows. (1) It is not reasonable that the residue obtained by the EMD is regarded as the
trend of the original data. (2) By fine-to-coarse, all the IMFs can be reconstructed to three new time series, which
are treated as high-frequency mode, low-frequency mode and trend respectively, but the explanation for the real life
is not satisfactory. (3) Trend which is reconstructed based on the correlation of the IMF marginal spectrums can
describe the basic behaviour of the original data. Contrasted with fine-to-coarse, the results obtained by the second
method are more reasonable. (4) Comparing with fine-to-coarse, high-fluctuation process can be further identified
by “Method Two”.
The two reconstructed methods have their own advantages and disadvantages in practical application. The
internal unity and difference of two methods are studied in this paper. In future, more researches can be concentrated
on proposing a new reconstruction method that is more practical, explanatory and convincing.
Acknowledgements
First of all, the authors gratefully acknowledge the financial support from National Natural Science Foundation
of China (No. 71003091, 71133005, 71071148, and71373009), Youth Innovation Promotion Association of the
Chinese Academy of Sciences, KeyResearch Program of Institute of Policy and Management, Chinese Academy of
Sciences.
References
1. Huang NE, Shen Z, Long SR, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series
analysis. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences1998;454(1971): 903-995.
272
Xiaoyang Yao et al. / Procedia Computer Science 31 (2014) 265 – 272
2. Wu Z, Huang NE. Ensemble empirical mode decomposition: a noise-assisted data analysis method. Advances in Adaptive Data Analysis
2009;1(01): 1-41.
3. Loutridis SJ. Damage detection in gear systems using empirical mode decomposition. Engineering Structures 2004;26(12): 1833-1841.
4. Lei Y, He Z, Zi Y. Application of the EEMD method to rotor fault diagnosis of rotating machinery. Mechanical Systems and Signal
Processing 2009;23(4): 1327-1338.
5. Cheng J, Yu D, Tang J, et al. Application of frequency family separation method based upon EMD and local Hilbert energy spectrum method
to gear fault diagnosis. Mechanism and Machine Theory 2008;43(6): 712-723.
6. Delechelle E, Nunes JC, Lemoine J. Empirical mode decomposition synthesis of fractional processes in 1D-and 2D-space. Image and Vision
Computing 2005;23(9): 799-806.
7. Wang L, Zhao Y, Chen F, et al. The 3D CT reconstruction algorithm to directly reconstruct multi-characteristic based on EMD. Measurement
2011;44(10): 2043-2048.
8. Rojas A, Gqrriz JM, Ramorez J, et al. Application of Empirical Mode Decomposition (EMD) on DaTSCAN SPECT images to explore
Parkinson Disease. Expert Systems with Applications 2013;40(7):2756-2766.
9. Li S, Zhou W, Yuan Q, et al. Feature extraction and recognition of ictal EEG using EMD and SVM. Computers in biology and medicine
2013;43(7):807-816.
10.Yu L, Lai KK, Wang SY, He KJ, 2007. “Oil price forecasting with an EMD-based multiscale neural network learning paradigm”, 7th
International Conference on Computational Science, Beijing, China.
11. Yu L, Wang SY, Lai KK. Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm. Energy Economics
2008;30(5): 2623-2635.
12. Ruan LF, Bao HJ. An empirical analysis of periodic fluctuations of real estate price based on EMD. Chinese Journal of Management
Science 2012;20(3):41-46.(in Chinese)
13. Zhang J L, Tan Z F, Li CJ.Short term electricity price forecasting of combined chaotic method. Chinese Journal of Management Science
2011;19(2):133-139.(in Chinese)
14. Li JP, Tang L, Sun XL, et al.Country risk forecasting for major oil exporting countries: A decomposition hybrid approach. Computers &
Industrial Engineering 2012;63(3):641-651.
15. Zhang X, Lai KK, Wang SY. A new approach for crude oil price analysis based on Empirical Mode Decomposition. Energy Economics
2008;30(3): 905-918.
16. Tang L, Li JP, Sun XL, et al.Multi-scale characteristics research of country risk based on mode composition. Management Review 2012;8:310. (in Chinese)
17. Li JP, Tang L, Sun XL, et al.Country risk forecasting for major oil exporting countries: A decomposition hybrid approach. Computers &
Industrial Engineering 2012;63(3):641-651
18. Yang Z, Ling BWK, Bingham C. Trend extraction based on separations of consecutive empirical mode decomposition components in
Hilbert marginal spectrum. Measurement 2013;46:2481-2491.
19. Hoti S, McAleer M, Shareef R. Modelling international tourism and country risk spillovers for Cyprus and Malta. Tourism Management
2007;28(6): 1472-1484.