Mining of Frequent Itemsets from
Streams of Uncertain Data
Carson Kai-Sang Leung, Boyu Hao
ICDE 2009
&
A Tree-Based Approach for Frequent
Pattern Mining from Uncertain Data
Carson Kai-Sang Leung, Mark Anthony F. Mateo,
and Dale A. Brajczuk
PAKDD 2008
Outline
Motivation
Related
Work
Method
UF-streaming
SUF-growth
UF-growth Improvement
Experimental Result
Conclusion
2
Motivation
ICDE:
1. Can we handle streams of uncertain data?
2. Given streams of uncertain data, how can we
effectively capture their important contents?
PAKDD:
1. Can we avoid generating candidates at all?
2. Since tree-based algorithms for handling
precise data are usually faster than their Aprioribased counterparts, is this also the case when
handling uncertain data?
3
Related Work
existential probability:P(x, ti)
x:item,ti:transaction
Using the “possible world” interpretation of
uncertain data , there are two possible worlds for
an item x and a transaction ti:
(i) x∈ti ,existential probability:P(x, ti)
(ii) x∈ti ,existential probability:1-P(x, ti)
4
Related Work
The expected support of an itemset X in TDB:
5
Batch
UF-streaming
first
second
Transactions
t1
t2
t3
t4
t5
t6
t7
t8
t9
Contents
{a:0.9,d:0.8,e:0.7,f:0.2}
{a:0.9,c:0.7,d:0.7,e:0.6}
{b:1.0,c:0.9}
{b:1.0,c:0.9,d:0.3}
{a:0.9,d:0.8}
{b:1.0,d:0.7,e:0.1}
{a:0.9,d:0.8}
{b:1.0,c:0.9,d:0.3}
{a:0.9,d:0.8,e:0.7}
first bath:(preMinsup=0.9) third
frequent items
→a:1.8、b:1.0、c:1.6、d:1.5、e:1.3、f:0.2
a:(1×0.9)+(1×0.9)=1.8
1
UF-tree
6
Batch
UF-streaming
first
second
third
Transactions
t1
t2
t3
t4
t5
t6
t7
t8
t9
Contents
{a:0.9,d:0.8,e:0.7,f:0.2}
{a:0.9,c:0.7,d:0.7,e:0.6}
{b:1.0,c:0.9}
{b:1.0,c:0.9,d:0.3}
{a:0.9,d:0.8}
{b:1.0,d:0.7,e:0.1}
{a:0.9,d:0.8}
{b:1.0,c:0.9,d:0.3}
{a:0.9,d:0.8,e:0.7}
expSup({a,e})=(1×0.9×0.6)+(1×0.9×0.7)=1.17≧preMinsup
expSup({d,e})=(1×0.7×0.6)+(1×0.8×0.7)=0.98≧preMinsup
→ frequent:{a,e},{d,e}
7
UF-tree for {e}-projected DB
All frequent:{a},{a,d},{a,e},{b},{c},{d},{d,e},{e}
8
Batch
UF-streaming
first
second
third
Transactions
t1
t2
t3
t4
t5
t6
t7
t8
t9
Contents
{a:0.9,d:0.8,e:0.7,f:0.2}
{a:0.9,c:0.7,d:0.7,e:0.6}
{b:1.0,c:0.9}
{b:1.0,c:0.9,d:0.3}
{a:0.9,d:0.8}
{b:1.0,d:0.7,e:0.1}
{a:0.9,d:0.8}
{b:1.0,c:0.9,d:0.3}
{a:0.9,d:0.8,e:0.7}
9
Batch
UF-streaming
first
second
third
Transactions
t1
t2
t3
t4
t5
t6
t7
t8
t9
Contents
{a:0.9,d:0.8,e:0.7,f:0.2}
{a:0.9,c:0.7,d:0.7,e:0.6}
{b:1.0,c:0.9}
{b:1.0,c:0.9,d:0.3}
{a:0.9,d:0.8}
{b:1.0,d:0.7,e:0.1}
{a:0.9,d:0.8}
{b:1.0,c:0.9,d:0.3}
{a:0.9,d:0.8,e:0.7}
10
Batch
SUF-growth
first
second
third
Transactions
t1
t2
t3
t4
t5
t6
t7
t8
t9
Contents
{a:0.9,d:0.8,e:0.7,f:0.2}
{a:0.9,c:0.7,d:0.7,e:0.6}
{b:1.0,c:0.9}
{b:1.0,c:0.9,d:0.3}
{a:0.9,d:0.8}
{b:1.0,d:0.7,e:0.1}
{a:0.9,d:0.8}
{b:1.0,c:0.9,d:0.3}
{a:0.9,d:0.8,e:0.7}
11
UF-growth Improvement
Improvement 1:
To reduce the memory consumption and to
increase the chance of path sharing, we discretize
and round the expected support of each tree node
up to k decimal places (e.g. 2 decimal places) and
the range(0,1]—to a maximum of 10k possible
values.
Example:at most 100 possible expected support
values ranging from 0.01 to 1.00 inclusive when k
= 2.
12
UF-growth Improvement
Before improvement
After improvement
13
UF-growth Improvement
Improvement 2:
The improved UF-growth does not need to build
subsequent UF-trees for any non-singleton
patterns(to reduce memory space).
subsequent:{d,e}-projected database
example:
14
UF-growth Improvement
{e}-projected→subset:{a,e},{a,d,e},{d,e}
expSup({a,e})=1×0.9×0.72=0.648
expSup({a,d,e})=1×0.9×0.72×0.7185=0.46575
expSup({d,e})=1×0.7185×0.72=0.5175
15
UF-growth Improvement
{e}-projected→subset:{a,e},{a,d,e},{d,e}
expSup({a,e})=0.648+(2×0.9×0.71875)=1.94175
expSup({a,d,e})=0.46575+(2×0.9×0.72×0.7185)=1.39725
expSup({d,e})=0.5175+(2×0.72×0.7185)=1.5525
All frequent:{a},{a,d},{a,d,e}{a,e},{b},{b,c},{c},{d},{d,e},{e}
16
Experimental Result
ICDE:
17
Experimental Result
PAKDD:
18
Conclusion
ICDE:
Experimental results showed the effectiveness of our
algorithms.
PAKDD:
With our tree-based approach, users can mine
frequent patterns from uncertain data effectively.
19
© Copyright 2026 Paperzz