You are on page 1of 20

Association Rule Mining

2
The Task
Two ways of defining the task
General
Input: A collection of instances
Output: rules to predict the values of any attribute(s) (not
just the class attribute) from values of other attributes
E.g. if temperature = cool then humidity =normal
If the right hand side of a rule has only the class attribute,
then the rule is a classification rule
Distinction: Classification rules are applied together as sets of
rules
Specific - Market-basket analysis
Input: a collection of transactions
Output: rules to predict the occurrence of any item(s) from
the occurrence of other items in a transaction
E.g. {Milk, Diaper} -> {Beer}
General rule structure:
Antecedents -> Consequents
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Association Rule Mining
Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other
items in the transaction
Market-Basket transactions
TI D I tems
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

Example of Association Rules
{Diaper} {Beer},
{Milk, Bread} {Eggs,Coke},
{Beer, Bread} {Milk},
Implication means co-occurrence,
not causality!
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Definition: Frequent Itemset
Itemset
A collection of one or more items
Example: {Milk, Bread, Diaper}
k-itemset
An itemset that contains k items
Support count (o)
Frequency of occurrence of an
itemset
E.g. o({Milk, Bread,Diaper}) = 2
Support
Fraction of transactions that contain
an itemset
E.g. s({Milk, Bread, Diaper}) = 2/5
Frequent Itemset
An itemset whose support is greater
than or equal to a minsup threshold
TI D I tems
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Definition: Association Rule
Example:
Beer } Diaper , Milk {
4 . 0
5
2
| T |
) Beer Diaper, , Milk (
= = =
o
s
67 . 0
3
2
) Diaper , Milk (
) Beer Diaper, Milk, (
= = =
o
o
c
Association Rule
An implication expression of the form
X Y, where X and Y are itemsets
Example:
{Milk, Diaper} {Beer}

Rule Evaluation Metrics
Support (s)
Fraction of transactions that contain
both X and Y
Confidence (c)
Measures how often items in Y
appear in transactions that
contain X
TI D I tems
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Association Rule Mining Task
Given a set of transactions T, the goal of
association rule mining is to find all rules having
support minsup threshold
confidence minconf threshold

Brute-force approach:
List all possible association rules
Compute the support and confidence for each rule
Prune rules that fail the minsup and minconf
thresholds
Computationally prohibitive!
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Mining Association Rules
Example of Rules:

{Milk,Diaper} {Beer} (s=0.4, c=0.67)
{Milk,Beer} {Diaper} (s=0.4, c=1.0)
{Diaper,Beer} {Milk} (s=0.4, c=0.67)
{Beer} {Milk,Diaper} (s=0.4, c=0.67)
{Diaper} {Milk,Beer} (s=0.4, c=0.5)
{Milk} {Diaper,Beer} (s=0.4, c=0.5)
TI D I tems
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

Observations:
All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
Rules originating from the same itemset have identical support but
can have different confidence
Thus, we may decouple the support and confidence requirements
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Mining Association Rules
Two-step approach:
1. Frequent Itemset Generation
Generate all itemsets whose support > minsup

2. Rule Generation
Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent itemset

Frequent itemset generation is still
computationally expensive

9
Power set
Given a set S, power set, P is the set of all
subsets of S
Known property of power sets
If S has n number of elements, P will have N = 2
n

number of elements.
Examples:
For S = {}, P={{}}, N = 2
0
= 1
For S = {Milk}, P={{}, {Milk}}, N=2
1
=2
For S = {Milk, Diaper}
P={{},{Milk}, {Diaper}, {Milk, Diaper}}, N=2
2
=4
For S = {Milk, Diaper, Beer},
P={{},{Milk}, {Diaper}, {Beer}, {Milk, Diaper},
{Diaper, Beer}, {Beer, Milk}, {Milk, Diaper, Beer}},
N=2
3
=8
10
Brute Force approach to
Frequent Itemset Generation
For an itemset with 3
elements, we have 8 subsets
Each subset is a candidate
frequent itemset which
needs to be matched against
each transaction
TI D I tems
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Itemset Count
{Milk} 4
{Diaper} 4
{Beer} 3
1-itemsets
2-itemsets
Itemset Count
{Milk, Diaper} 3
{Diaper, Beer} 3
{Beer, Milk} 2 3-itemsets
Itemset Count
{Milk, Diaper, Beer} 2
Important Observation: Counts of subsets cant be smaller than the count of an itemset!
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Reducing Number of Candidates
Apriori principle:
If an itemset is frequent, then all of its subsets must also
be frequent

Apriori principle holds due to the following property
of the support measure:


Support of an itemset never exceeds the support of its
subsets
This is known as the anti-monotone property of support
) ( ) ( ) ( : , Y s X s Y X Y X > _
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Illustrating Apriori Principle
Item Count
Bread 4
Coke 2
Milk 4
Beer 3
Diaper 4
Eggs 1
Itemset Count
{Bread,Milk} 3
{Bread,Beer} 2
{Bread,Diaper} 3
{Milk,Beer} 2
{Milk,Diaper} 3
{Beer,Diaper} 3
Itemset Count
{Bread,Milk,Diaper} 3

Items (1-itemsets)
Pairs (2-itemsets)

(No need to generate
candidates involving Coke
or Eggs)
Triplets (3-itemsets)
Minimum Support = 3
If every subset is considered,

6
C
1
+
6
C
2
+
6
C
3
= 41
With support-based pruning,
6 + 6 + 1 = 13
Write all possible 3-itemsets
and prune the list based on infrequent 2-itemsets
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Apriori Algorithm
Method:

Let k=1
Generate frequent itemsets of length 1
Repeat until no new frequent itemsets are identified
Generate length (k+1) candidate itemsets from length k
frequent itemsets
Prune candidate itemsets containing subsets of length k that
are infrequent
Count the support of each candidate by scanning the DB
Eliminate candidates that are infrequent, leaving only those
that are frequent
Note: This algorithm makes several passes over the transaction list
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Rule Generation
Given a frequent itemset L, find all non-empty subsets f c
L such that f L f satisfies the minimum confidence
requirement
If {A,B,C,D} is a frequent itemset, candidate rules:
ABC D, ABD C, ACD B, BCD A,
A BCD, B ACD, C ABD, D ABC
AB CD, AC BD, AD BC, BC AD,
BD AC, CD AB,

If |L| = k, then there are 2
k
2 candidate association rules
(ignoring L C and C L)
Because, rules are generated from frequent itemsets,
they automatically satisfy the minimum support threshold
Rule generation should ensure production of rules that satisfy
only the minimum confidence threshold
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Rule Generation
How to efficiently generate rules from frequent itemsets?
In general, confidence does not have an anti-monotone property
c(ABC D) can be larger or smaller than c(AB D)

But confidence of rules generated from the same itemset has an
anti-monotone property
e.g., L = {A,B,C,D}:

c(ABC D) > c(AB CD) > c(A BCD)

Confidence is anti-monotone w.r.t. number of items on the RHS of
the rule
Example: Consider the following two rules
{Milk} {Diaper, Beer} has c
m
=0.5
{Milk, Diaper} {Beer} has c
md
=0.67 > c
m
16
Computing Confidence for Rules
Unlike computing support,
computing confidence does
not require several passes
over the transaction list
Supports computed from
frequent itemset generation
can be reused
Tables on the side show
support values for all (except
null) the subsets of itemset
{Bread, Milk, Diaper}
Confidence values are shown
for the (2
3
-2) = 6 rules
generated from this
frequent itemset
Itemset Support
{Milk} 4/5=0.8
{Diaper} 4/5=0.8
{Bread} 4/5=0.8
Example of Rules:

{Bread, Milk} {Diaper} (s=0.6, c=1.0)
{Milk, Diaper} {Bread} (s=0.6, c=1.0)
{Diaper, Bread} {Milk} (s=0.6, c=1.0)
{Bread} {Milk, Diaper} (s=0.6, c=0.75)
{Diaper} {Milk, Bread} (s=0.6, c=0.75)
{Milk} {Diaper, Bread} (s=0.6, c=0.75)
Itemset Support
{Milk, Diaper} 3/5=0.6
{Diaper, Bread} 3/5=0.6
{Bread, Milk} 3/5=0.6
Itemset Support
{Bread, Milk, Diaper} 3/5=0.6
17
Rule generation in Apriori
Algorithm
For each frequent k-
itemset where k > 2
Generate high confidence
rules with one item in the
consequent
Using these rules,
iteratively generate high
confidence rules with
more than one items in
the consequent
if any rule has low
confidence then all the
other rules containing the
consequents can be
pruned (not generated)
{Bread, Milk, Diaper}
1-item rules
{Bread, Milk} {Diaper}
{Milk, Diaper} {Bread}
{Diaper, Bread} {Milk}
2-item rules
{Bread} {Milk, Diaper}
{Diaper} {Milk, Bread}
{Milk} {Diaper, Bread}
18
Evaluation
Support and confidence used by Apriori allow a lot of
rules which are not necessarily interesting
Two options to extract interesting rules
Using subjective knowledge
Using objective measures (measures better than confidence)
Subjective approaches
Visualization users allowed to interactively verify the
discovered rules
Template-based approach filter out rules that do not fit
the user specified templates
Subjective interestingness measure filter out rules that
are obvious (bread -> butter) and that are non-actionable (do
not lead to profits)
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Drawback of Confidence

Coffee

Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100
Association Rule: Tea Coffee

Confidence= P(Coffee|Tea) = 0.75
but P(Coffee) = 0.9
Although confidence is high, rule is misleading
P(Coffee|Tea) = 0.9375
20
Objective Measures
Confidence estimates rule quality in terms of the
antecedent support but not consequent support
As seen on the previous slide, support for consequent
(P(coffee)) is higher than the rule confidence
(P(coffee/Tea))
Weka uses other objective measures
Lift (A->B) = confidence(A->B)/support(B) = support(A-
>B)/(support(A)*support(B))
Leverage (A->B) = support(A->B) support(A)*support(B)
Conviction(A->B) = support(A)*support(not B)/support(A->B)
conviction inverts the lift ratio and also computes support
for RHS not being true

You might also like