Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Mining Frequent and Homogeneous
Closed Itemsets
I. Hilali(1,2)
T.-Y. Jen(1) D. Laurent(1)
S. Ben Yahia(2)
C. Marinica(1)
(1) ETIS - University Cergy Pontoise - France
(2) Faculté des Sciences de Tunis - Tunisia
Data Management in the Cloud Era Workshop
Kuala Lumpur - October 8, 2014
1 / 17
Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Outline
1
Motivation
2
∆-Closed Itemsets
3
Homogeneous Itemsets
4
T -Closure
5
∆T -Closure
6
Current and Future Work
2 / 17
Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Goal of the work
Given that the number of frequent itemsets is huge in general
Our goal is to lower this number using
1
Semantic knowledge, assuming an taxonomy on the set of
items
2
A generalization of the notion of closed itemsets
To this end, we consider
A similarity measure between items induced by the taxonomy
Homogeneous itemsets as sets of “similar enough” items
A closure of itemsets with respect to support and homogeneity
3 / 17
Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Motivating Example
Tid
t1
t2
t3
t4
t5
t6
Items
a1 a2 a3 s1 s2
a2 n1 s1
a1 a2 a3 s1 s2
a1 a2 n1 v1
a1 a2 n1 s2 v1
s1 s2 v 1
For a support threshold σ = 25%
{a1 , n1 } and {a1 , a2 , n1 , s2 , v1 }
are frequent
{a1 , n1 } is not closed
{a1 , a2 , n1 , s2 , v1 } is closed
It is well known that the support of {a1 , n1 } can deduced from
those of all frequent closed itemsets
Algorithms for mining frequent closed itemsets exist in the
literature
4 / 17
Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Motivating Example (cont’d)
Consider the following taxonomy over the items
Item
Food
Beverage
Alcoholic
Bier
Wine
a2
a1
Vegetable
Seafood
Nonalcoholic
a3
n1
s1
s2
v1
5 / 17
Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Motivating Example (cont’d)
{a1 , n1 } is more homogeneous than {a1 , a2 , n1 , s2 , v1 }
a1 and n1 are beverages
s2 is some sea food product and v1 a vegetable
We propose to focus on frequent and homogeneous itemsets, based
on our previous work [DEXA 2011, ISIP 2013]
Then
{a1 , a2 , n1 , s2 , v1 } is discarded since it is not homogeneous
{a1 , n1 } is kept since it is homogeneous and frequent
However, remember that only {a1 , a2 , n1 , s2 , v1 } is mined as a
closed itemset...
6 / 17
Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Motivating Example (cont’d)
To cope with this difficulty
1
We define and study the notion of closure with respect to
homogeneity
2
We investigate how closures with respect to support and
homogeneity can be combined
Based on these results it is possible to
1
Mine all frequent and homogeneous closed itemsets − the
closure being with respect to support and homogeneity
2
Recover the support and the homogeneity degree of all
frequent and homogeneous itemsets
7 / 17
Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Basic Definitions
A transaction table ∆ is a set of pairs τ = (Tid, I ) where
Tid is a transaction identifier and
I is an itemset also denoted by It(τ )
Given a transaction table ∆ and I an itemset
Support of I
sup(I ) = |{τ ∈ ∆ | I ⊆ It(τ )}| / |∆|
I is frequent w.r.t. σ if sup(I ) ≥ σ
8 / 17
Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Basic Definitions (cont’d)
∆-closure of I
Γ∆ (I ) =
\
It(τ )
τ ∈∆,I ⊆It(τ )
I is ∆-closed if Γ∆ (I ) = I
Assuming that all frequent and ∆-closed itemsets and their
support are known, for every itemset I
1
I is frequent if and only if it is contained in one frequent and
∆-closed itemset
2
sup(I ) is equal to the support of the least ∆-closed itemsets
containing I
9 / 17
Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Item Similarity
We assume a taxonomy T on the items, that is
T is a tree whose leaves are the items
Similarity between items i and i 0
If i = i 0 then
Otherwise
sim(i, i 0 ) = 1
sim(i, i 0 ) = (1 + HR(i, i 0 )) / (k ∗ NSR(i, i 0 ))
where
k is the depth of T
HR(i, i 0 ) is the level of lub(i, i 0 )
NSR(i, i 0 ) is the number of internal nodes connecting i and i 0
10 / 17
Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Itemset Homogeneity
Homogeneity degree of itemset I
If I = ∅ then
Otherwise
hom(I ) = 1
hom(I ) = mini,i 0 ∈I (sim(i, i 0 ))
I is homogeneous with respect to h if hom(I ) ≥ h
Monotonicity property
If I1 ⊆ I2 then hom(I1 ) ≥ hom(I2 )
As a consequence, frequent and homogeneous itemsets can be
mined using a level-wise algorithm
11 / 17
Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Closure w.r.t. a Taxonomy T
T -closure of I
ΓT (I ) = {i ∈ I | (∃ν ∈ nodes(T ))(∃i 0 ∈ I )
((ν, i) ∈ links(T ) ∧ (ν, i 0 ) ∈ links(T ))}
Intuitively ΓT (I ) is the set of all items i for which
there exists i 0 in I such that i and i 0 have the same “father” in T
I is T -closed if ΓT (I ) = I
Properties of T -closure
1
ΓT is a closure operator
2
For every non singleton itemset I , hom(I ) = hom(ΓT (I ))
12 / 17
Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Example
Remember the following taxonomy over the items
Item
Food
Beverage
Alcoholic
Bier
Wine
a2
a1
Vegetable
Seafood
Nonalcoholic
a3
n1
s1
s2
v1
13 / 17
Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Example (cont’d)
For itemsets {a2 , n1 } and {a1 , s1 , s2 }
ΓT ({a2 , n1 }) = {a2 , a3 , n1 }
ΓT ({a1 , s1 , s2 }) = {a1 , s1 , s2 }
Thus {a2 , n1 } is not T -closed
Thus {a1 , s1 , s2 } is T -closed
Regarding homogeneity
sim(a2 , n1 ) = (1 + 1) / (4 + 4) = 1/4
sim(a1 , s1 ) = sim(a1 , s2 ) = (1 + 0) / (4 + 6) = 1/10
sim(s1 , s2 ) = (1 + 2) / (4 + 1) = 3/5
For h = 20%
hom({a2 , n1 }) = hom({a2 , a3 , n1 }) = 1/4
Thus {a2 , n1 } and {a2 , a3 , n1 } are homogeneous
hom({a1 , s1 , s2 }) = min(1/10, 3/5) = (1/10)
Thus {a1 , s1 , s2 } is not homogeneous
14 / 17
Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Combining ∆- and T -closures
The problem is the following
What to store in order to get an exact condensed
representation of the frequent and closed itemsets?
Storing
all ∆-closed itemsets and their support along with
all T -closed itemsets with their homogeneity degree
is not an issue
We define a third closure Γ∆T as follows
Γ∆T (I ) = Γ∆ (I ) ∩ ΓT (I )
15 / 17
Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Properties of the ∆T -Closure
1
Γ∆T is a closure operator
2
sup(I ) = sup(Γ∆T (I ))
3
For every non singleton itemset I , hom(I ) = hom(Γ∆T (I ))
Given all frequent and homogeneous ∆T -closed itemsets with their
support and homogeneity degree, for every itemset I
1
I is frequent and homogeneous if and only if it is contained in
one frequent and homogeneous ∆T -closed itemset
2
sup(I ) and hom(I ) are respectively equal to the support and
the homogeneity degree of the least ∆T -closed itemset
containing I
16 / 17
Motivation
∆-Closed Itemsets
Homogeneous Itemsets
T -Closure
∆T -Closure
Current and Future Work
Current and Future Work
Our current work is to implement and test an efficient algorithm
for mining all frequent and homogeneous ∆T -closed itemsets
Our future work will be to investigate how to merge this work with
our previous work on not frequent itemsets [ISIP 2013]
what Tao has just presented
Thank you for your attention
Questions???
17 / 17
© Copyright 2026 Paperzz