You are on page 1of 3

Data Mining and Data Warehousing

Frequent Pattern Mining-Frequent Pattern Mining


Algorithms

Insights
Frequent pattern mining is an important task of data mining and is widely used in
practical applications like online shopping, spam detection, intrusion detection,
etc. Core of frequent pattern mining lies in extracting frequent item sets which
have frequency of occurrence more than a threshold. But exponential number of
combinations of item sets are possible. We can't search for each possible item set
for frequency.
Thus, efficient algorithms are required which can mine all frequent item sets in a
reasonable amount of time. This gave rise to algorithms like Apriori and FP-
Growth. These algorithms are based on a pruning rule that if an item set is not
frequent, then all item sets derived from it are also not frequent, and thus can be
pruned.
Imagination
In case if memory requirement is not a big constraint, can we achieve faster
frequent pattern mining than the state-of-art?
Frequent pattern mining can find spam mails because of their high frequency of
occurrence. In such cases, frequent pattern mining algorithms could differentiate
spam mails from normal mails. But, can the same algorithms differentiate
between all types of classes in any data sets?
Glossary

 Association Rule: An association rule is of the form A => B where A I, B I


and A B .
 Confidence: For a given Association Rule A => B, confidence is the percentage
of records in D containing A that also contain B.
 Count: It is the number of occurences of an itemset in a given set of
transactions.
 Itemset: It is a set of items where each item is chosen from a given finite set of
elements.
 k-Itemset: It is an itemset which has k number of items in that set.
 Support: The support of an itemset X in a Database D is the fraction of
transactions of D in which itemset X is occuring.
 Support(X) = Count(X)/|D|
 Support-count: Same as Count or Frequency.
 Prefix tree: An ordered tree which is used for storing string prefixes helping in
faster search and information retrieval.
 Header List: It is a data structure which consists of three entries. The node-
id,the count and a pointer to the first occurrence of the node in the prefix
tree.
 FP-tree: A variant of the prefix tree data structure which has an additional
header list consisting of pointers to entries in the prefix tree.
 Conditional pattern base/FP-tree: It is a reduced FP-tree consisting of entries
with counts which are obtained by imposing certain conditions on the tree.

Resources
 Apriori Algorithm Notes PDF (For your convenience you can get them inside
Learn More Quadrant)
 Frequent Pattern Tree PDF (For your convenience you can get them inside
Learn More Quadrant)
 Conditional Pattern Base PDF (For your convenience you can get them inside
Learn More Quadrant)
 Maximal and Closed Itemsets PDF (For your convenience you can get them
inside Learn More Quadrant)

References
 Chapter 5, Han and Kamber, Data Mining and Techniques
 Chapters 6 and 7, Steinbach and Vipin Kumar, Introduction to Data Mining
 Chapter 2, VikramPudi and P Radha Krishna, Data Mining
 R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets
of items in large databases. SIGMOD'93.
 R. J. Bayardo. Efficiently mining long patterns from databases. SIGMOD'98.
 N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed
itemsets for association rules. ICDT'99.
 R. Agrawal and R. Srikant. Fast algorithms for mining association
rules VLDB'94
 Census Dataset http://archive.ics.uci.edu/ml/datasets/Census+Income
 Mining frequent patterns without candidate generation JiaweiHan,Jian Pei and
Y Yin. In the proceedings of ACM SIGMOD 2000
 Prefix tree data structure: http://en.wikipedia.org/wiki/Trie
 Divide and Conquer
strategy: http://en.wikipedia.org/wiki/Divide_and_conquer_algorithm
 FP-Growth
implementations: http://www.csc.liv.ac.uk/~frans/KDD/Software/FPgrowth/fp
Growth.html
 FP-Growth package in python: http://github.com/enaeseth/python-fp-growth
 Frequent Itemset Mining Repository: http://fimi.cs.helsinki.fi/src/
 A review of associative classification by FadiThabtah. PDF
 Integrating Classification with Association Rule Mining PDF

You might also like