Identifying Follow-C..

Identifying Follow-correlation
Itemset-pairs
Shichao Zhang, Jilian Zhang , Xiaofeng Zhu , Zifang Huang
Department of Computer Science, Guangxi Normal University,
China
Made in ICDM’06
Outline




Introduction
Definition P3.1 (FCIP)
Algorithm
Conclusion
Introduction
Denoted as P3.1 Itemset Pairs or
Follow-Correlation Itemset-pairs(FCIP),
which will be defined in detail
 This paper proposes this new kind of
interesting patterns and aims to
develop techniques for mining them.

Definition 1.

Itemoccurring sequence .
SI=<I1,I2,I3,…,It,…,I T >
I
where t {0,1} and t
[1,T ]
1
SI= < m,…, n> where t =1, t =m,…,n and 1 m n T
I
I
I
0
SI= <Im,…,In> where It =0, t =m,…,n and 1
1
Len(SI )=n-m+1
0
m n T
Len(SI )=n-m+1
Definition 2.

Follow-Correlation Itemset-Pairs
<C ,A >
1
C =SC =< Cm , …, C n> 1 m n T
1
A =SA =< Ak , …, A l >where k {n,n+1} k l T
and
Cm - 1 = C n +1 = 0 , if 1 m n T
Ak - 1 = Al +1 = 0 , if 1 k l T
Definition 2.(cont.)


The pair <C, A> is called the
Lag Follow-Correlation Itemset-Pairs
(LFCIP)
If k = n+1
Strong Follow-Correlation Itemset-Pairs
(SFCIP)
If k = n
Definition 2.(cont.)
for sequence
A=’101010101010’
B=’010101010101’

1
1
1
1
1
1
Both <A, B> and <B, A> are different FCIP
FCIP <A, B> is LFCIP and its frequency is 6 but
1 1
<B, A> frequency is 5
Definition 4.

Longest P3.1 pattern

P =<C, A> m
k,k
n
Example 1.

Consider a given database D
Let A and B be two items in D
Example 1.(cont.)

Using our method we can identify an
interesting follow-correlation: itemsetpairs < A 1, B 1> with frequency of 10.
Example 2.

Consider the same database D
Using support-confidence framework, we
can obtain the association rule A B with
confidence 0.333.
Example 2.(cont.)

Using our method we can discover an
interesting follow-correlation: itemset-pairs
3
1
<A,B>
4
2
<A,B>
Example 3.
IDIIIODDDIIIIODDD for stock A
IODDOIDODODDODODD for stock B
A is 10111 00001
11100
00
D:representing more than 10% of the daily value
B is 10000 10000
00000
00
Decrease
omit those zero values
I:representing morethan 20% of the daily value
A is 111101111
Increase
B is 100010000
O:Other kinds of changes

Example 5.
Given a customer transactional database of a supermarket
2 2
<d ,c >
we call it the Strong
Follow-Correlation
Itemset-Pairs (LFCIP).
2 3
<d ,c >
we call it the Lag
Follow-Correlation
Itemset-Pairs (LFCIP).
Example 5.(cont.)
2

3
2
2
<{d ,e }, c > and <{d ,e }, c >
This kind of P3.1 pattern
contains more than
one items
Example 5.(cont.)

For ease of discussion in this paper we consider the situation
that there is only one item in the Action itemset of a P3.1pattern.
1 3
1 1
<f ,a > <f ,a > <a ,f13> <a1,f 1>
3 1
3 1
1 3
1 2
<g ,b > <b ,g > <b ,g > <b ,g >
Algorithm step1

‘S’,’E’ and ‘P’ denote the Start position, End position and the
successive Pointer to next node respectively
Algorithm step2
Conclusion
The method is trivial to find interesting
pattern.
