3.1 General Description

Similarity Measure Based on Partial
Information of Time Series
Advisor:Dr. Hsu
Graduate:You-Cheng Chen
Author:Xiaoming Jin
Yuchang Lu
Chunyi Shi
Outline








Motivation
Objective
Introduction
Retrieval and Representation of partial Information
System Setup
Results and Discussion
Conclusions
Personal Opinion
Motivation
A “good” similarity measurement is determined
by human.
Objective

To propose a model for the retrieval and
representation of the partial information in
time series.
Introduction
The model has three objects:
Get the partial information
Represent partial information in a compressed form
Most similarity model could be applied
Retrieval and Representation of
Partial Information
3.1 General Description
X  ( X (1),..., X ( N ))
Definition 1:
Use a rule F to decompose X into a set of time series
X  ( X 1' ,..., X T' )
3.1 General Description
Definition 2:
(1) Segment X into a set of sub-series
X j  ( X ( jr  r  1),..., X ( jr))
(2) X’jk is the k-th F-based component of sub-series Xj
Use mapping rule T to map each X’jk to a value Rk(j)
3.1 General Description
Definition 3:
K  ( K1,..., KW )
is the orders of all the representing sequences of interest.
Ak  ( A1,..., AW )
where An is the degree of user’s interest to n-th component
'
(
A
X
n Kn Kn ) is portion of partial information of interest
3.1 General Description
Definition 4:
R(m)  AKMOD( m ,w ) RKMOD( m ,w ) ( Km / w )
is the full representing sequence(FRS) of the partial
'
information n ( AKn X Kn )
3.1 General Description
Definition 5: Given two time series X,Y
MD ( X , Y )  D( FRS ( X ), FRS (Y ))
3.1 General Description
Sum up, a representing model for partial information
can be summarized by
Decomposition method F
Representation method T
Distance measurement D
3.1 General Description
Example 1
3.1 General Description
Use F to decompose time series to two components
(1) Local fluctuating movement S’1
(2) Global movement S’2
R1 ( j ) 


S ' j 1  fluctuation
0
otherwise

FRS(X)=R1 and the length of the FRS(X)=200/8
3.2 Practial Method
Let H is transform matrix of a given orthonormal
discrete transform
So Tj=H*Xj
We denote the results of discrete transform of time
Series Xj and Yj by DT(Xj)=XTj, DT(Yj)=YTj
3.2 Practial Method
The k-th component of X is
X n' (n)  Tn / r  (k )  IBk (n  ( n / r   1)  r )
IBm

1
 H0
,m


The k-th representing sequence is
1
1
H 1
, m  H r 1,m




Rk (m)  Tm (k )
Then FRS(X) can be calculated as:
R(m)  AKMOD( m ,w ) Tm / w ( KMOD( m,W ) )
T
3.2 Practial Method
MD ( X , Y ) 
q
W
2
2
(
XT
(
K
)

YT
(
K
))
A
 j n
j
n
Kn
j 1 n 1
W
W
W
'
 L2 (  X Kn
AKn , YKn' AKn )
'
W
'
 L2 (  X Kn AKn ,  YKn
AKn )
n 1
n 1
n 1
n 1
Here we use DCT(discrete Cosine transform) in our
experiments
4. System Setup
4.1 Evaluation of Similarity Measurement Based on
Partial Information
We use hierarchical agglomerative clustering(HAC)
to cluster FRSs.
Sim (Ci , S j )  2 Ci  S j /( Ci  S j )


Sim (C , S )    max Sim Ci , S j  / k
j
 i

5. Results and Discussion
We used historical stock data and only considered the
time series of closing price.
Step 1: use DCT to decompose time series and to
represent partial information.
Step 2: E=(E1,…,Er) to represent the chosen portion.
Step 3: E was used to calculate K and together with A
Then FRSs of each time series were generated
Step 4: calculating MD and clustering
5. Results and Discussion
11,15,14,10,19,10,14,17,14
3, 3, 3, 3, 2, 4, 4, 5, 5
5. Results and Discussion
Conclusions
The experimental results could help designing a
more effective and more efficient similarity measurement
Personal Opinion
The similarity measurement can be improved
better by increasing the weight of the meaningful
component.