0304

A GRAPH-BASED ALGORITHM
FOR FREQUENT CLOSED
ITEMSETS MINING
Li Li; Donghai Zhai;
Fan Jin
IEEE Systems &
Information
Engineering Design
Symposium
Outline
 Introduction
 CFCG algorithm
 Experiment result
 Conclusion
Introduction
 Frequent closed itemsets are frequent itemsets’s
subset, and contain all information of frequent
itemsets.
 Frequent closed itemset mining is very important
to association rules mining.
 This paper proposes a graph - based algorithm
called GFCG ( Graph – based Frequent Closed
itemset Generation ).
 The method only scans the database only two
times, and avoid candidate set generation.
Algorithm
 The GFCG adopts the structure of bit – vector.
 Step1. The GFCG scans the database for the first
time to find the frequent items.
 Step2. Then, the GFCG at the second time, sets
every bit in bit – vector.
 Step3. In the graph construction phase, GFCG
algorithm constructs an association graph to
indicate the relationships between frequent items.
 Step4. Through the MineSamePrefixFreq’s
method, finds out all frequent closed itemsets.
GFCG algorithm
 Step 1 尋找frequent itemset
 CreateFrequentItems ( D, F, minsup )
– // N is number of transactions
– for ( j=1; j <= N; j++ )
–
for all items i in jth transaction
{ i.count ++ ; }
F = { i | i is an item and i.count >= minsup }
GFCG algorithm (cont.)
 Step 2 建構 Bit – Vector
 CreateBitVector ( D, F )
– for all items i in F ;
–
allocate BVi and set all bit in BVi to 0
– for ( j =1; j <= N; j++ )
–
for all items i in jth transaction
–
set the jth bit of BVi to 1
GFCG algorithm (cont.)
 Step 3 建構 frequent itemsets’ association graph
– CreateGraph ( F )
–
–
–
–
–
let L be an order of the items in F;
for all frequent items i ≡ F
for all frequent items j ≡ F i > j
if ( number of 1 in BVi BVj ) >= minsup then
i.link.add(j); //create edge i --> j
GFCG algorithm (cont.)
 Step 4 找出所有的 frequent closed itemsets
– MineSamePrefixFreq ( I, BVI, nsupp, C )
–
–
–
–
–
–
–
Let i be the last item in itemset I;
for all j , j≡i.link
I’ = I ∪ { j } ;BVI’ = BVI Λ BVj ;
let nnewsupp be the number of 1s in BVI’
if (nnewsupp >= minsup)
if (nnewsupp = nsupp ) { covered = TRUE ; }
MineSamePrefixFreq ( I, BVI’, nsupp, C )
Example(1)
TID
Iemset
100
245
200
13
300
245
400
135
500
145
 Minimum support
threshold be 2.
 1, 2, 3, 4, 5 are the
frequent items.
 The corresponding bit –
vectors are
BV1 = (01011),
BV2 = (10100),
BV3 = (01010),
BV4 = (10101),
BV5 = (10111).
Example(2)
 Let the order be L
={1,2,3,4,5}
 The frequent 2-items
are {1,3}, {1,5},
{2,4}, {2,5}, {4,5}.
 Frequent closed
itemset is {2 4 5 }
Experiment
Conclusion
 The experiment evaluation and
performance study on real data and
synthetic data set show that the new
algorithm outperforms apriori – based
algorithm.