Professional Documents
Culture Documents
Insights
Frequent pattern mining is an important task of data mining and is widely used in
practical applications like online shopping, spam detection, intrusion detection,
etc. Core of frequent pattern mining lies in extracting frequent item sets which
have frequency of occurrence more than a threshold. But exponential number of
combinations of item sets are possible. We can't search for each possible item set
for frequency.
Thus, efficient algorithms are required which can mine all frequent item sets in a
reasonable amount of time. This gave rise to algorithms like Apriori and FP-
Growth. These algorithms are based on a pruning rule that if an item set is not
frequent, then all item sets derived from it are also not frequent, and thus can be
pruned.
Imagination
In case if memory requirement is not a big constraint, can we achieve faster
frequent pattern mining than the state-of-art?
Frequent pattern mining can find spam mails because of their high frequency of
occurrence. In such cases, frequent pattern mining algorithms could differentiate
spam mails from normal mails. But, can the same algorithms differentiate
between all types of classes in any data sets?
Glossary
Resources
Apriori Algorithm Notes PDF (For your convenience you can get them inside
Learn More Quadrant)
Frequent Pattern Tree PDF (For your convenience you can get them inside
Learn More Quadrant)
Conditional Pattern Base PDF (For your convenience you can get them inside
Learn More Quadrant)
Maximal and Closed Itemsets PDF (For your convenience you can get them
inside Learn More Quadrant)
References
Chapter 5, Han and Kamber, Data Mining and Techniques
Chapters 6 and 7, Steinbach and Vipin Kumar, Introduction to Data Mining
Chapter 2, VikramPudi and P Radha Krishna, Data Mining
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets
of items in large databases. SIGMOD'93.
R. J. Bayardo. Efficiently mining long patterns from databases. SIGMOD'98.
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed
itemsets for association rules. ICDT'99.
R. Agrawal and R. Srikant. Fast algorithms for mining association
rules VLDB'94
Census Dataset http://archive.ics.uci.edu/ml/datasets/Census+Income
Mining frequent patterns without candidate generation JiaweiHan,Jian Pei and
Y Yin. In the proceedings of ACM SIGMOD 2000
Prefix tree data structure: http://en.wikipedia.org/wiki/Trie
Divide and Conquer
strategy: http://en.wikipedia.org/wiki/Divide_and_conquer_algorithm
FP-Growth
implementations: http://www.csc.liv.ac.uk/~frans/KDD/Software/FPgrowth/fp
Growth.html
FP-Growth package in python: http://github.com/enaeseth/python-fp-growth
Frequent Itemset Mining Repository: http://fimi.cs.helsinki.fi/src/
A review of associative classification by FadiThabtah. PDF
Integrating Classification with Association Rule Mining PDF