You are on page 1of 24

Association Rule Mining

&
Apriori Algorithm
Binu Jasim
Data Mining (Monsoon Sem-2014)
NIT Calicut

Association Analysis
To find out Associations b/w Items/Objects

Market Basket Analysis


One Application of Association Analysis

Other: Bioinformatics, medical diagnosis etc.

Two key Challenges


Efficiently mine enormous amounts of data for
association patterns
Associations occurring due to chance should
be avoided

What is in a Super Market ?


Items/Products Lot of them !
- Denoted as I

Transactions (T )- People Buying different


Items
A single transaction/a bill Contains a list of
Items customer i bought
-Denoted as tr i

Transaction Data
Items (I): The Set of all items

I = { i1 , i2 , , i }
Transactions

T = { tr1 , tr2 , , tr }

Transactions
tr I
i1

i2

i3

i4

i5

tr1

tr2

tr3

tr4

tr5

Item Sets
Items set is a collection of zero or more Items

K-Item set contains K Items


trj contains an item set X
if x trj

Document Collection as Transaction


Data
Treat words as Items & Each document as a
transaction
Word1

Word2

D1

D2

D2

:
:
:

Wordr

Images as Transaction Data


Each Pixel as an Item and an Image as a
Transaction

So A may be represented as
0111101010100111 -for 4x4 image resolution

Associations
{A} -> {B} indicates Item B is also bought if
Item A is bought
{A,B} -> {C} indicates C is bought if Items A &
B are bought together

So We have X -> Y as associations


where X Y =

Total # Associations
If we have d items
Then total # Associations = 3 2+1 + 1

Eg: - Items set {A,B}


Possible Associations: {A} -> {B}
& {B} -> {A}
d = 2, so 32 23 + 1 = 2

Total # Associations
I = {A, B, C}
d=3
Total # Associations = 33 24 + 1 = 12
{A} -> {B, C}

{B, C} -> {A}

{B} -> {A, C}

{A, C} -> {B}

{C} -> {A, B}

{A, B}-> {C}

{A}->{B}

{B}-> {A}

{A}->{C}

{C}-> {A}

{B}->{C}

{C}->{B}

Proof !
Each item can go into either of the 3 boxes

Antecedent and Consequent cant be empty


Which gives 3 2+1 + 1

Support Count ()
Support count of an Item set X is given as

() = | *trj : x trj +|
Eg: ,

tr1

tr2

= 2,

= 0

Support & Confidence


We have the association X-> Y s.t. X Y =

Support(X->Y) =

Confidence(X->Y) =

Support & Confidence Thresholds


R: {C,D} -> {E} (Association Rule)

Support(R) = 3/6 = 0.5 > minsup


Confidence(R) = = 0.75 > minconf

Why large support?


Items people seldom buy can still have large
confidence. Eg:- {1GB usb} -> {headset}
- may give large confidence as support
count of {1 GB usb} is small
- But this transaction {1GB usb, headset}
together is rare, so small support

Why large confidence ?


Confidence is a better measure than support
to indicate how often items are bought
together.
Confidence(X->Y) also gives an estimate of
conditional probability of Y given X

Caution!: Correlation doesnt imply


Causation
Just because a rule X->Y has large support and
large confidence,
X need not be the cause of Y.
It only implies correlation

Association Rule Mining Problem


Given a set of transactions T, find all the rules
having
support > minsup & confidence > minconf
Eg;- minsup = 20%, minconf = 50%
Obvious:

minconf > minsup

Association Rule Mining


Brute Force: List all the rules and compute
minsup & minconf
- Computationally expensive: O(3 )
Better Strategy: Check minsup & minconf
for all the subsets: Still exponential: O(2 )

The Idea
If {Milk, Bread, Butter} is frequent
then all of its subsets are also frequent

i.e. support =
({Milk,Bread,Butter})/N > minsup
then ( Milk, Bread /N > minsup

Decomposed into 2 Sub Tasks


Frequent Item Set Generation: Find all the
item sets which satisfy minsup threshold
Rule Generation: Extract all the high
confidence rules out of the Frequent Item set
found in step 1

You might also like