Bi-Directional Prefix Preserving Closed Extension For Linear

International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue10 – Oct 2013
Bi-Directional Prefix Preserving Closed Extension For Linear Time Closed
Pattern Mining
1
1
2
3
R. Sandeep Kumar, 2 R.Suhasini, 3 B.Sivaiah
PG Scholar, Department of Computer Science and Engineering,
CMR College of Engineering and Technology
Hyderabad, A.P, India
Assistant Professor, Department of Computer Science and Engineering
CMR College of Engineering and Technology
Hyderabad, A.P, India
Associate Professor, Department of Computer Science and Engineering,
CMR College of Engineering and Technology
Hyderabad, A.P, India
Abstract: — In data mining frequent closed
Frequent pattern mining is one of the fundamental
pattern has its own role we undertake the closed
problems in data mining and has many applications
pattern which was frequent discovery crisis for
such as association rule mining
structured data class, which is an extended version
of a perfect algorithm Linear time Closed pattern
and condensed representation of inductive queries.
Miner (LCM) for the purpose of mining closed
To
patterns which are frequent from huge transaction
equivalence classes induced by the occurrences
databases.
LCM is based on prefix preserving
had been considered. Closed patterns are maximal
closed extension and depth first search. As an
patterns of an equivalence class. This paper
extension to this proposed model, we devise a
addresses the problems of enumerating all frequent
itemset
support
closed patterns. For solving this problem, there
proportionality, which causes computational and
have been proposed many algorithms. These
search process time scalability in closed itemset
algorithms are basically based on the enumeration
discovery. The proposed model can be labeled as
of frequent patterns, that is, enumerate frequent
bi-directional prefix preserving closed extension
patterns, and output only those being closed
for LCM.
patterns. The enumeration of frequent patterns has
I.
pruning
process
INTRODUCTION
under
handle
frequent
patterns
efficiently,
been studied well, and can be done in efficiently
short time. Many computational experiments
supports that the algorithms in practical take very
ISSN: 2231-2803
http://www.ijcttjournal.org
Page3430
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue10 – Oct 2013
short time per pattern on average. However, as we
item set X in D is the probability of X occurring in
will show in the later section, the number of
a transaction T ϵ D
frequent patterns can be exponentially larger than
the number of closed patterns, hence the
computation time can be exponential in the size of
The
number
of
frequent
patterns
can
be
exponentially larger than the number of closed
datasets for each closed pattern on average.
patterns, hence the computation time can be
Hence, the existing algorithms use heuristic
exponential in the size of datasets for each closed
pruning to cut off non-closed frequent patterns.
pattern on average. Hence, the existing algorithms
However, the pruning are not complete, hence
use heuristic pruning to cut off non-closed frequent
they still have possibilities to take exponential
patterns. However, the pruning are not complete,
time for each closed pattern. Moreover, these
hence they still
have possibilities to take
algorithms have to store previously obtained
exponential
for
frequent
Moreover,
patterns
in
memory for
avoiding
time
these
each
algorithms
closed
have
pattern.
to
store
duplications. Some of them further use the stored
previously obtained frequent patterns in memory
patterns for checking the closedness of patterns.
for avoiding duplications. Some of them further
This
use
consumes
much
memory,
sometimes
the
stored
patterns
for
checking
the
exponential in both the size of both the database
.closedness. of patterns. This consumes much
and the number of closed patterns. In summary,
memory, sometimes exponential in both the size of
the existing algorithms possibly take exponential
both the database and the number of closed
time and memory for both the database size and
patterns. In summary, the existing algorithms
the number of frequent closed patterns. This is not
possibly take exponential time and memory for
only a theoretical observation but is supported by
both the database size and the number of frequent
results of computational experiments in FIMI'03.
closed patterns. This is not only a theoretical
In the case that the number of frequent patterns is
observation but is supported by results of
much larger than the number of frequent closed
computational experiments in FIMI'03. In the case
patterns, such as BMS-WebView1 with small
that the number of frequent patterns is much larger
supports, the computation time of the existing
than the number of frequent closed patterns, such
algorithms are very large for the number of
as BMS-WebView1 with small supports, the
frequent closed patterns.
computation time of the existing algorithms are
ISSN: 2231-2803
The frequency of an
http://www.ijcttjournal.org
Page3431
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue10 – Oct 2013
very large for the number of frequent closed
can may be attend or not from the record of
patterns.
transaction. Proposition may be positive or
II PROBLEM STATEMENT:
negative. If an item set is noticed in a record of
Frequent closed pattern discovery is the problem
transaction, consisting a true proposition is
of finding all the frequent closed patterns in a
analogous. Item sets have been mapped to
given data set, where closed patterns are the
propositions p and q if each item set is observed or
maximal pat-terns among each equivalent class
not observed in a single transaction.
that consists of all frequent patterns with the same
Applying Inference Approach: In the inference
occurrence sets in a tree database. It is known that
Approach Back Scan Search-space pruning in
the number of frequent closed patterns is much
frequent closed sequence mining is trickier than
smaller than that of frequent patterns on most real
that in frequent closed item sets mining. A depth-
world datasets, while the frequent closed patterns
first-search-based closed itemset mining algorithm
still contain the complete information of the
like CLOSET can stop growing a prefix item set
frequency of all frequent patterns. Closed pattern
once it finds that this itemset can be absorbed by
discovery is useful to increase the performance
an already mined closed item sets. The Back Scan
and the comprehensiveness in data mining. There
pruning method to check if it can be pruned, if not,
is a potential demand for efficient methods for
computes the number of backward-extension
extracting useful patterns from the semi-structured
items, and calls subroutine bide Sp SDB; Sp; min
data, so called semi-structured data mining.
sup;BEI; FCS.Subroutine bide Sp SDB; Sp; min
Proposed Solution: Though the LCM model is
sup;BEI; FCS recursively calls itself and works as.
scalable as it uses minimal storage, but the search
For prefix Sp, scan its projected database Sp SDB
process is not refined such that it can performs
once to find it’s locally. Frequent items; compute
well under dense pattern sets. Henceforth here we
the number of forward extension items, if there is
attempt to devise a pattern pruning process under
no backward-extension.
support proportionality, which refered as bi-
extension item, output Sp as a frequent closed
directional prefix preserving closed extension for
sequence, grow Sp with each locally frequent item
LCM.
in lexicographical ordering to get a new prefix, and
III SYSTEM MODULES
build the pseudo projected
Verifying item sets: Verify the item sets which
are valid or not. An item set contains couple of
Item nor forward-
 database for the new prefix, for each new
prefix,
states. In a one transaction record only, an item
ISSN: 2231-2803
http://www.ijcttjournal.org
Page3432
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue10 – Oct 2013
 First check if it can be pruned, if not,
compute
the
number
of
implication of equivalence can be created, then
backward-
another pseudo implication of equivalence also
extension items and call itself until the
coexists. Two pseudo implications of equivalences
pruned item sets not found.
always exist as a pair because they are created
Identifying LCM Rules: In this module we
based on the same, since they share the same
derive the approach of mapping LCM rule to
conditions,
equivalence. A complete mapping between the
equivalences. ELCM rules meet the necessary and
two is realized in progressive steps three. Every
sufficient conditions and have the truth table
step outcomes based on previous step result. In
values of logical equivalence, by definition; a
initial step, in an implication item sets are mapped
ELCM rule consists of a pair of pseudo
to propositions. Item sets an LCM rule can be
implications of equivalences that have higher
either observed or not. In an implication a
support values compared to another two pseudo
proposition
implications
may
be
positive
or
negative.
two
of
pseudo
implications
equivalences.
Each
of
pseudo
Analogously, the existence of an item set mapped
implication of equivalence is an LCM rule with the
to true proposition cause of observation will be
additional property that it can be mapped to a
done in transactional records.
logical equivalence.
Applying Inference Approach: In the inference
Approach Back Scan Search-space pruning in
frequent closed sequence mining is trickier than
that in frequent closed item sets mining. A depthfirst-search-based closed itemset mining algorithm
like CLOSET can stop growing a prefix item set
Fig: Example of all closed patterns and their ppc
once it finds that this itemset can be absorbed by
extensions. Core indices are circled
an already mined closed itemsets. The BackScan
Deriving ELCM Rules from mapped LCM
pruning method to check if it can be pruned, if not,
rules: The pseudo implications of equivalences
computes the number of backward-extension
can be further defined into a concept called
items, and calls subroutine bide Sp SDB; Sp; min
ELCM rules. We highlight that not all pseudo
sup; BEI; FCS. Subroutine bide Sp SDB; Sp; min
implications of equivalences can be created using
sup; BEI; FCS recursively calls itself and works as
item sets X and Y. Nonetheless, if one pseudo
for prefix Sp, scan its projected database Sp SDB
once to find its locally. Frequent items; compute
ISSN: 2231-2803
http://www.ijcttjournal.org
Page3433
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue10 – Oct 2013
the number of forward extension items, if there is
equivalences always exist as a pair because they
no
forward-
are created based on the same, since they share the
extension item, output Sp as a frequent closed
same conditions, two pseudo implications of
sequence, grow Sp with each locally frequent item
equivalences. ELCM rules meet the necessary and
in lexicographical ordering to get a new prefix,
sufficient conditions and have the truth table
and build the pseudo projected database for the
values of logical equivalence, by definition; a
new prefix, for each new prefix, First check if it
ELCM rule consists of a pair of pseudo
can be pruned, if not, compute the number of
implications of equivalences that have higher
backward-extension items and call itself until the
support values compared to another two pseudo
pruned itemsets not found.
implications
backward-extension.
Item
nor
of
equivalences.
Each
pseudo
implication of equivalence is an LCM rule with the
additional property that it can be mapped to a
logical equivalence.
Basic algorithm for enumerating frequent closed
IV RELATED WORK
patterns
This section shows the results of computational
experiments
for
evaluating
the
practical
performance of our algorithms on real world and
synthetic datasets. The datasets, which are from the
FIMI'03 site (http://fimi.cs.helsinki.fi/):
retail,
accidents; IBM Almaden Quest research group
website (T10I4D100K); UCI
ML repository
Description of Algorithm LCM
(connect,
Deriving ELCM Rules from mapped LCM
mlearn/MLRepository.html) Click-stream Data by
rules: The pseudo implications of equivalences
Ferenc Bodon (kosarak); KDD-CUP 2000 [11]
can be further defined into a concept called
(BMS-WebView-1,
ELCM rules. We highlight that not all pseudo
http://www.ecn.purdue.edu/KDDCUP/).
implications of equivalences can be created using
evaluate the efficiency of ppc extension and
item sets X and Y. Nonetheless, if one pseudo
practical improvements, we implemented several
implication of equivalence can be created, then
algorithms as follows. _ freqset: algorithm using
another pseudo implication of equivalence also
frequent
coexists.
straightforward
Two
ISSN: 2231-2803
pseudo
implications
of
http://www.ijcttjournal.org
pumsb);
pattern
(at
http://www.ics.uci.edu/
BMS-POS)
enumeration.
implementation
Page3434
(
_
of
at
To
straight:
LCM
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue10 – Oct 2013
(frequency counting by tuples). _ occ: LCM with
preserving closure extension to such tree and graph
occurrence deliver. _ occ+dbr: LCM with
mining.
occurrence
REFERENCES:
deliver
and
anytime
database
reduction for both frequency counting and closure
1.
R.
Agrawal,H.
Mannila,R.
Srikant,H.
Toivonen,A. I. Verkamo, Fast Discovery of
Association Rules, In Advances in Knowledge
Discovery and Data Mining, MIT Press, 307.328,
1996.
2. T. Asai, K. Abe, S. Kawasoe, H. Arimura, H.
Sakamoto, S. Arikawa, Ef_cient Substructure
FIG Datasets: AvTrSz means average transaction
Discovery from Large Semi-structured Data, In
size
Proc. SDM'02, SIAM, 2002.
V CONCLUSION
3. T. Asai, H. Arimura, K. Abe, S. Kawasoe, S.
We addressed the problem of enumerating all
Arikawa,
frequent closed patterns in a given transaction
Semistructured Data Stream, In Proc. IEEE
database, and proposed an efficient algorithm
ICDM'02, 27.34, 2002.
LCM to solve this, which uses memory linear in
4. T. Asai, H. Arimura, T. Uno, S. Nakano,
the input size, i.e., the algorithm does not store the
Discovering Frequent Substructures in Large
previously obtained patterns in memory. The main
Unordered Trees, In Proc. DS'03, 47.61, LNAI
contribution of this paper is that we proposed
2843, 2003.
pre_x-preserving
which
5. R. J. Bayardo Jr., Ef_ciently Mining Long
combines tail-extension of and closure operation
Patterns from Databases, In Proc. SIGMOD'98,
of to realize direct enumeration of closed patterns.
85.93, 1998.
We recently studied frequent substructure mining
6. Y. Bastide, R. Taouil, N. Pasquier, G. Stumme,
from ordered and unordered trees based on a
L. Lakhal, Mining Frequent Patterns with Counting
deterministic tree expansion technique called the
Inference, SIGKDD Explr., 2(2), 66.75, Dec. 2000.
right most expansion. There have been also
7.
pioneering works on closed pattern mining in
http://fimi.cs.helsinki.fi/, 2003.
sequences and graphs. It would be an interesting
8. E. Boros, V. Gurvich, L. Khachiyan, K. Makino,
future problem to extend the framework of prefix-
On the Complexity of Generating Maximal
ISSN: 2231-2803
closure
extension,
B.
Online
Goethals,
http://www.ijcttjournal.org
Algorithms
the
FIMI'03
Page3435
for
Mining
Homepage,
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue10 – Oct 2013
Frequent and Minimal Infrequent Sets, In Proc.
STACS 2002, 133-141, 2002.
9. D. Burdick, M. Calimlim, J. Gehrke, MAFIA:
A Maximal Frequent Itemset Algorithm for
Transactional Databases, In Proc. ICDE 2001,
443-452, 2001.
10. J. Han, J. Pei, Y. Yin, Mining Frequent
Patterns without Candidate Generation, In Proc.
SIGMOD' 00, 1-12, 2000
11. R. Kohavi, C. E. Brodley, B. Frasca, L.
Mason, Z. Zheng, KDD-Cup 2000 Organizers'
Report: Peeling the Onion, SIGKDD Explr., 2(2),
86-98, 2000.
12. H. Mannila, H. Toivonen, Multiple Uses of
Frequent Sets and Condensed Representations, In
Proc. KDD'96, 189.194, 1996.
13. N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal,
Ef_cient Mining of Association Rules Using
Closed Itemset Lattices, Inform. Syst., 24(1),
25.46, 1999.
14. N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal,
Discovering
Frequent
Closed
Itemsets
for
Association Rules, In Proc. ICDT'99, 398-416,
1999.
15. J. Pei, J. Han, R. Mao, CLOSET: An Ef_cient
Algorithm for Mining Frequent Closed Itemsets,
In Proc. DMKD'00, 21-30, 2000.
ISSN: 2231-2803
http://www.ijcttjournal.org
Page3436