You are on page 1of 93

Unsupervised Learning

 Clustering
 Unsupervised classification, that is, without
the class attribute
 Want to discover the classes

 Association Rule Discovery


 Discover correlation

Data Mining and Knowledge


Discovery 1
The Clustering Process
 Pattern representation
 Definition of pattern proximity measure
 Clustering
 Data abstraction
 Cluster validation
Data Mining and Knowledge
Discovery 2
Pattern Representation
 Number of classes
 Number of available patterns
 Circles, ellipses, squares, etc.
 Feature selection
 Can we use wrappers and filters?
 Feature extraction
 Produce new features
 E.g., principle component analysis (PCA)
Data Mining and Knowledge
Discovery 3
Pattern Proximity
 Want clusters of instances that are similar
to each other but dissimilar to others
 Need a similarity measure
 Continuous case
 Euclidean measure (compact isolated clusters)
 The squared Mahalanobis distance
1
d M (xi , x j )  (xi  x j ) (xi  x j )
T

alleviates problems with correlation


 Many more measures
Data Mining and Knowledge
Discovery 4
Pattern Proximity
 Nominal attributes
nx
d (xi , x j ) 
n
n  Number of attributes
x  Number of attributes that are the same

Data Mining and Knowledge


Discovery 5
Clustering Techniques
Clustering

Hierarchical Partitional

Single Complete Square Mixture


Link Link Error Maximization

CobWeb K-means Expectation


Maximization
Data Mining and Knowledge
Discovery 6
Technique Characteristics
 Agglomerative vs Divisive
 Agglomerative: each instance is its own cluster
and the algorithm merges clusters
 Divisive: begins with all instances in one cluster
and divides it up
 Hard vs Fuzzy
 Hard clustering assigns each instance to one
cluster whereas in fuzzy clustering assigns degree
of membership
Data Mining and Knowledge
Discovery 7
More Characteristics
 Monothetic vs Polythetic
 Polythetic: all attributes are used simultaneously, e.g., to
calculate distance (most algorithms)
 Monothetic: attributes are considered one at a time
 Incremental vs Non-Incremental
 With large data sets it may be necessary to consider only
part of the data at a time (data mining)
 Incremental works instance by instance

Data Mining and Knowledge


Discovery 8
Hierarchical Clustering
Dendrogram
S
F G i
m
i
l
a
r
i
C DE t
B y
A
A B C D E F G

Data Mining and Knowledge


Discovery 9
Hierarchical Algorithms
 Single-link
 Distance between two clusters set equal to the
minimum of distances between all instances
 More versatile
 Produces (sometimes too) elongated clusters
 Complete-link
 Distance between two clusters set equal to maximum
of all distances between instances in the clusters
 Tightly bound, compact clusters
 Often more useful in practice
Data Mining and Knowledge
Discovery 10
Example: Clusters Found
2
Single-Link 1 1 2 2 2 2
1 1 1 2 2
1 * * * * * * * * 2* 2
1 1 2
1 11 2
2 2 2

2
1 1 2 2 2 2
1 1 1 2
Complete-Link 1
1
* * * * * * * * 2* 2 2
2
1
1 11 2
2 2 2

Data Mining and Knowledge


Discovery 11
Partitional Clustering
 Output a single partition of the data
into clusters
 Good for large data sets
 Determining the number of clusters is a
major challenge

Data Mining and Knowledge


Discovery 12
K-Means
Predetermined
number of clusters

Start with seed


clusters of one
element

Seeds
Data Mining and Knowledge
Discovery 13
Assign Instances to Clusters

Data Mining and Knowledge


Discovery 14
Find New Centroids

Data Mining and Knowledge


Discovery 15
New Clusters

Data Mining and Knowledge


Discovery 16
Discussion: k-means
 Applicable to fairly large data sets
 Sensitive to initial centers
 Use other heuristics to find good initial
centers
 Converges to a local optimum
 Specifying the number of centers very
subjective

Data Mining and Knowledge


Discovery 17
Clustering in Weka
 Clustering algorithms in Weka
 K-Means
 Expectation Maximization (EM)

 Cobweb

 hierarchical, incremental, and


agglomerative

Data Mining and Knowledge


Discovery 18
CobWeb
 Algorithm (main) characteristics:
 Hierarchical and incremental
 Uses category utility
Improvemen t in probability estimate
The k clusters because of instancecluster assigment

Pr C 
 l  i ij l
Pr a  v | C 
2
 Pr ai  vij
2
  
CU C1 , C2 ,..., Ck  
l i j

All possible values


 Why divide by k? for attribute ai

Data Mining and Knowledge


Discovery 19
Category Utility
 If each instance in its own cluster
1 vij  actual value of instance
 
Pr ai  vij | Cl  
0 otherwise
 Category utility function becomes
n   Pr ai  vij  
2

CU C1 , C2 ,..., Ck  
i j

k
 Without k it would always be best for each
instance to have its own cluster, overfitting!
Data Mining and Knowledge
Discovery 20
The Weather Problem
Outlook Temp. Humidity Windy Play
Sunny Hot High FALSE No
Sunny Hot High TRUE No
Overcast Hot High FALSE Yes
Rainy Mild High FALSE Yes
Rainy Cool Normal FALSE Yes
Rainy Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Sunny Mild High FALSE No
Sunny Cool Normal FALSE Yes
Rainy Mild Normal FALSE Yes
Sunny Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Rainy Mild High TRUE No
Data Mining and Knowledge
Discovery 21
Weather Data (without Play)
 Label instances: a,b,….,n
Start by putting Add another instance
the first instance in its own cluster
in its own cluster

a a b

Data Mining and Knowledge


Discovery 22
Adding the Third Instance
Evaluate the category utility of adding the instance to one
of the two clusters versus adding it as its own cluster

b a a c b

a c b c

Highest utility

Data Mining and Knowledge


Discovery 23
Adding Instance f
First instance not to get
its own cluster:

a b c d

e f
Look at the instances:
Rainy Cool Normal FALSE
Rainy Cool Normal TRUE
Quite similar!

Data Mining and Knowledge


Discovery 24
Add Instance g
Look at the instances:
E) Rainy Cool Normal FALSE
F) Rainy Cool Normal TRUE
G) Overcast Cool Normal TRUE

a b c d

e f g

Data Mining and Knowledge


Discovery 25
Add Instance h
Look at the instances: Runner up

A) Sunny Hot High FALSE


D) Rainy Mild High FALSE
H) Sunny Mild High FALSE
Rearrange: Best matching node

Merged into a
single cluster
before h is added
b c

a d h e f g

(Splitting is also possible)


Data Mining and Knowledge
Discovery 26
Final Hierarchy

g f j m n

a d h c l e i

b k What next?

Data Mining and Knowledge


Discovery 27
Dendrogram  Clusters

g f j m n

a d h c l e i

b k What do a, b, c, d, h, k, and l
have in common?

Data Mining and Knowledge


Discovery 28
Numerical Attributes
 Assume normal distribution
 1 1
l PrCl  2  i     
1

CU C1 , C2 ,..., Ck    il i 

k
 Problems with zero variance!
 The acuity parameter imposes a minimum
variance

Data Mining and Knowledge


Discovery 29
Hierarchy Size (Scalability)
 May create very large hierarchy
 The cutoff parameter is uses to suppress
growth
If
CU C1 , C2 ,..., Ck   Cutoff

cut node off.

Data Mining and Knowledge


Discovery 30
Discussion
 Advantages
 Incremental  scales to large number of instances
 Cutoff  limits size of hierarchy
 Handles mixed attributes
 Disadvantages
 Incremental  sensitive to order of instances?
 Arbitrary choice of parameters:
 divide by k,
 artificial minimum value for variance of numeric attributes,
 ad hoc cutoff value
Data Mining and Knowledge
Discovery 31
Probabilistic Perspective
 Most likely set of clusters given data
 Probability of each instance belonging to a
cluster
 Assumption: instances are drawn from one of
several distributions
 Goal: estimate the parameters of these
distributions
 Usually: assume distributions are normal

Data Mining and Knowledge


Discovery 32
Mixture Resolution
 Mixture: set of k probability distributions
 Represent the k clusters
 Probabilities that an instance takes certain
attribute values given it is in the cluster

 What is the probability an instance belongs to


a cluster (or a distribution)

Data Mining and Knowledge


Discovery 33
One Numeric Attribute
Two cluster mixture model:
Cluster B
Cluster A

Attribute
Given some data, how can you determine the parameters:
 A  Mean for Cluster A
 A  Standard deviation for Cluster A
 B  Mean for Cluster B
 B  Standard deviation for Cluster B
p A  Probabilit y of being in Cluster A
Data Mining and Knowledge
Discovery 34
Problems
 If we knew which instance came from each
cluster we could estimate these values
 If we knew the parameters we could calculate
the probability that an instance belongs to
each cluster
Prx | A Pr[ A] f ( x;  A ,  A ) p A
PrA | x   
Pr[ x] Pr[ x]
( x )2
1 
f ( x;  A ,  A )  e 2 2
.
2
Data Mining and Knowledge
Discovery 35
EM Algorithm
 Expectation Maximization (EM)
 Start with initial values for the parameters
 Calculate the cluster probabilities for each instance
 Re-estimate the values for the parameters
 Repeat
 General purpose maximum likelihood
estimate algorithm for missing data
 Can also be used to train Bayesian networks
(later)
Data Mining and Knowledge
Discovery 36
Beyond Normal Models
 More than one class:
 Straightforward
 More than one numeric attribute
 Easy if assume attributes independent
 If dependent attributes, treat them jointly
using the bivariate normal
 Nominal attributes
 No more normal distribution!

Data Mining and Knowledge


Discovery 37
EM using Weka
 Options
 numClusters: set number of clusters.
 Default = -1 selects it automatically
 maxIterations: maximum number of iterations
 seed -- random number seed
 minStdDev -- set minimum allowable standard
deviation

Data Mining and Knowledge


Discovery 38
Other Clustering
 Artificial Neural Networks (ANN)
 Random search
 Genetic Algorithms (GA)
 GA used to find initial centroids for k-means
 Simulated Annealing (SA)
 Tabu Search (TS)
 Support Vector Machines (SVM)
 Will discuss GA and SVM later
Data Mining and Knowledge
Discovery 39
Applications
 Image segmentation
 Object and Character Recognition
 Data Mining:
 Stand-alone to gain insight into the data
 Preprocess before classification that
operates on the detected clusters
Data Mining and Knowledge
Discovery 40
DM Clustering Challenges
 Data mining deals with large databases
 Scalability with respect to number of instance
 Use a random sample (possible bias)
 Dealing with mixed data
 Many algorithms only make sense for numeric
data
 High dimensional problems
 Can the algorithm handle many attributes?
 How do we interpret a cluster in high dimensions?

Data Mining and Knowledge


Discovery 41
Other (General) Challenges
 Shape of clusters
 Minimum domain knowledge (e.g.,
knowing the number of clusters)
 Noisy data
 Insensitivity to instance order
 Interpretability and usability

Data Mining and Knowledge


Discovery 42
Clustering for DM
 Main issue is scalability to large databases
 Many algorithms have been developed for
scalable clustering:
 Partitional methods: CLARA, CLARANS
 Hierarchical methods: AGNES, DIANA, BIRCH,
CURE, Chameleon

Data Mining and Knowledge


Discovery 43
Practical Partitional Clustering
Algorithms
 Classic k-Means (1967)
 Work from 1990 and later:
 k-Medoids
 Uses the mediod instead of the centroid
 Less sensitive to outliers and noise
 Computations more costly
 PAM (Partitioning Around Mediods)
algorithm

Data Mining and Knowledge


Discovery 44
Large-Scale Problems
 CLARA: Clustering LARge Applications
 Select several random samples of instances
 Apply PAM to each
 Return the best clusters
 CLARANS:
 Similar to CLARA
 Draws samples randomly while searching
 More effective than PAM and CLARA
Data Mining and Knowledge
Discovery 45
Hierarchical Methods
 BIRCH: Balanced Iterative Reducing and
Clustering using Hierarchies
 Clustering feature: triplet summarizing
information about subclusters

 Clustering feature tree: height-balanced


tree that stores the clustering features

Data Mining and Knowledge


Discovery 46
BIRCH Mechanism
 Phase I:
 Scan database to build an initial CF tree
 Multilevel compression of the data
 Phase II:
 Apply a selected clustering algorithm to the
leaf nodes of the CF tree
 Has been found to be very scalable

Data Mining and Knowledge


Discovery 47
Conclusion
 The use of clustering in data mining
practice seems to be somewhat limited
due to scalability problems
 More commonly used unsupervised
learning:

Association Rule Discovery


Data Mining and Knowledge
Discovery 48
Association Rule Discovery
 Aims to discovery interesting correlation or
other relationships in large databases
 Finds a rule of the form
if A and B then C and D
 Which attributes will be included in the
relation is unknown

Data Mining and Knowledge


Discovery 49
Mining Association Rules

 Similar to classification rules


 Use same procedure?
 Every attribute is the same
 Apply to every possible expression on right
hand side
 Huge number of rules  Infeasible

 Only want rules with high coverage/support

Data Mining and Knowledge


Discovery 50
Market Basket Analysis

 Basket data: items purchased on per-


transaction basis (not cumulative, etc)
 How do you boost the sales of a given product?
 What other products does discontinuing a product
impact?
 Which products should be shelved together?
 Terminology (market basket analysis):
 Item - an attribute/value pair
 Item set - combination of items with min. coverage

Data Mining and Knowledge


Discovery 51
How Many k-Item Sets Have
Minimum Coverage?
Outlook Temp. Humidity Windy Play
Sunny Hot High FALSE No
Sunny Hot High TRUE No
Overcast Hot High FALSE Yes
Rainy Mild High FALSE Yes
Rainy Cool Normal FALSE Yes
Rainy Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Sunny Mild High FALSE No
Sunny Cool Normal FALSE Yes
Rainy Mild Normal FALSE Yes
Sunny Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Rainy Mild High TRUE No
Data Mining and Knowledge
Discovery 52
Item Sets
1-Item 2-Item 3-Item 4-Item

Outlook=sunny Outlook=sunny Outlook=sunny Outlook=sunny


(5) temp=mild (2) temp=hot temp=hot
humidity=high humidity=high
(2) play=no (2)
Outlook= Outlook=sunny Outlook=sunny Outlook=sunny
overcast (4) temp=hot (2) temp=hot humidity=high
play=no (2) windy=false
play=no (2)
Outlook=rainy Outlook=sunny Outlook=sunny Outlook=over
(5) humidity=norm humidity=norm temp=hot
(2) play=yes (2) windy=false
play=no (2)
Temp=cool (4) Outlook=sunny Outlook=sunny Outlook=rainy
windy=true (2) humidity=high temp=mild
windy=false (2) windy=false
play=yes (2)
Temp=mild (6) Outlook=sunny Outlook=sunny Outlook=rainy
windy=true (2) humidity=high humidity=norm
play=no (3) windy=false
play=yes (2)
Data Mining and Knowledge
Discovery 53
From Sets to Rules
3-Item Set w/coverage 4:
Humidity = normal, windy = false, play = yes

Association Rules: Accuracy

If humidity = normal and windy = false then play = yes 4/4


If humidity = normal and play = yes then windy = false 4/6
If windy = false and play = yes then humidity = normal 4/6
If humidity = normal then windy = false and play = yes 4/7
If windy = false then humidity = normal and play = yes 4/8
If play = yes then humidity = normal and windy = false 4/9
If - then humidity = normal and windy = false and play=yes 4/12

Data Mining and Knowledge


Discovery 54
From Sets to Rules
(continued)
4-Item Set w/coverage 2:
Temperature = cool, humidity = normal,
windy = false, play = yes

Association Rules: Accuracy

If temperature = cool, windy = false  humidity = normal, play = yes 2/2


If temperature = cool, humidity = normal, windy = false  play = yes 2/2
If temperature = cool, windy = false, play = yes  humidity = normal 2/2

Data Mining and Knowledge


Discovery 55
Overall

 Minimum coverage (2):


 12 1-item sets, 47 2-item sets, 39 3-item sets, 6 4-item
sets

 Minimum accuracy (100%):


 58 association rules
“Best” Rules (Coverage = 4, Accuracy = 100%)

If humidity = normal and windy = false  play = yes


If temperature = cool  humidity = normal
If outlook = overcast  play = yes
Data Mining and Knowledge
Discovery 56
Association Rule Mining

STEP 1: Find all item sets that meet


minimum coverage

STEP 2: Find all rules that meet minimum


accuracy

STEP 3: Prune
Data Mining and Knowledge
Discovery 57
Generating Item Sets
 How do we generate minimum coverage item
sets in a scalable manner?
 Total number of item set is huge
 Grows exponentially in the number of attributes
 Need an efficient algorithm:
 Start by generating minimum coverage 1-item sets
 Use those to generate 2-item sets, etc
 Why do we only need to consider minimum
coverage 1-item sets?

Data Mining and Knowledge


Discovery 58
Justification
Item Set 1: {Humidity = high}
Coverage(1) = Number of times humidity is high

Item Set 2: {Windy = false}


Coverage (2) = Number of times windy is false

Item Set 3: {Humidity = high, Windy = false}


Coverage (3) = Number of times humidity is high and
windy is false

Coverage (3)  Coverage(1) If Item Set 1 and 2 do not


Coverage (3)  Coverage(2) both meet min. coverage
Item Set 3 cannot either
Data Mining and Knowledge
Discovery 59
Generating Item Sets
Start with all (A B C)
3-item sets (A B D)
There are only two 4-
that meet min. (A C D)
item sets that could
coverage (A C E)
possibly work
Merge to (Consider only
generate sets that start
4-item sets with the same
two attributes)

(A B C D) Candidate 4-item sets with minimum


(A C D E) coverage (must be checked)

Data Mining and Knowledge


Discovery 60
Algorithm for Generating Item
Sets
 Build up from 1-item sets so that we
only consider item sets that is found by
merging two minimum coverage sets
 Only consider sets that have all but one
item in common
 Computational efficiency further
improved using hash tables

Data Mining and Knowledge


Discovery 61
Generating Rules
Meets min. If windy = false and play = no then
coverage
and accuracy
outlook = sunny and humidity = high


If windy = false and play = no
Meets min. then outlook = sunny
coverage
If windy = false and play = no
and accuracy
then humidity = high

Data Mining and Knowledge


Discovery 62
How Many Rules?
 Want to consider every possible subset
of attributes as consequent
 Have 4 attributes:
 Four single consequent rules
 Six double consequent rules
 Two triple consequent rules
 Twelve possible rules for single 4-item set!
 Exponential explosion of possible rules
Data Mining and Knowledge
Discovery 63
Must We Check All?
If A and B then C and D
Coverage  Number of times A, B, C, and D are true
Number of times A, B, C, and D are true
Accuracy 
Number of times A and B are true
If A,B and C then D
Coverage  Number of times A, B, C, and D are true
Number of times A, B, C, and D are true
Accuracy 
Number of times A, B, and C are true

Data Mining and Knowledge


Discovery 64
Efficiency Improvement
 A double consequent rule can only be OK if
both single consequent rules are OK
 Procedure:
 Start with single consequent rules
 Build up double consequent rules, etc.
 candidate rules
 check for accuracy
 In practice: need to check far fewer rules

Data Mining and Knowledge


Discovery 65
Apriori Algorithm
 This is a simplified description of the
Apriori algorithm
 Developed in early 90s and is the most
commonly used approach
 New developments focus on
 Generating item sets more efficiently
 Generating rules from item sets more
efficiently

Data Mining and Knowledge


Discovery 66
Association Rule Discovery
using Weka
 Parameters to be specified in Apriori:
 upperBoundMinSupport: start with this value
of minimum support
 delta: in each step decrease the minimum
support required by this value
 lowerBoundMinSupport: final minimum
support
 numRules: how many rules are generated
 metricType: confidence, lift, leverage, conviction
 minMetric: smallest acceptable value for a rule
 Handles only nominal attributes
Data Mining and Knowledge
Discovery 67
Difficulties
 Apriori algorithm improves performance
by using candidate item sets
 Still some problems …
 Costly to generate large number of item sets
 To generate a frequent pattern of size 100 need
>21001030 candidates!
 Requires repeated scans of database to check
candidates
 Again, most problematic for long patterns
Data Mining and Knowledge
Discovery 68
Solution?
 Can candidate generation be avoided?
 New approach:
 Create a frequent pattern tree (FP-tree)
 stores information on frequent patterns
 Use the FP-tree for mining frequent
patterns
 partitioning-based
 divide-and-conquer
 (as opposed to bottom-up generation)
Data Mining and Knowledge
Discovery 69
Database  FP-Tree
TID Items Frequent Items
100 F,A,C,D,G,I,M,P F,C,A,M,P
200 A,B,C,F,L,M,O F,C,A,B,M
300 B,F,H,J,O F,B
400 B,C,K,S,P C,B,P Root
500 A,F,C,E,L,P,M,N F,C,A,M,P
Head of
Item node links F:4 C:1
F
C:3 B:1 B:1
C
A A:3 P:1
B
(Min. support = 3) M:2 B:1
M
P
P:2 M:1
Data Mining and Knowledge
Discovery 70
Computational Effort
 Each node has three fields
 item name
 count
 node link
 Also a header table with
 item name
 head of node link
 Need two scans of the database
 Collect set of frequent items
 Construct the FP-tree
Data Mining and Knowledge
Discovery 71
Comments
 The FP-tree is a compact data structure
 The FP-tree contains all the information
related to mining frequent patterns (given the
support)
 The size of the tree is bounded by the
occurrences of frequent items
 The height of the tree is bounded by the
maximum number of items in a transaction
Data Mining and Knowledge
Discovery 72
Mining Patterns
 Mine complete set of frequent patterns

 For any frequent item A, all possible


patterns containing A can be obtained
by following A’s node links starting from
A’s head of node links

Data Mining and Knowledge


Discovery 73
Example Root

Head of
Item node links
F:4 C:1
F
C:3 B:1 B:1
C
A A:3 P:1
B
M M:2 B:1 Frequent Pattern
P (P:3)
P:2 M:1
Paths
<F:4, C:3, A:3, M:2, P:2>
Occurs twice <C:1, B:1, P:1>
Occurs ones
Data Mining and Knowledge
Discovery 74
Rule Generation
 Mining complete set of association rules
has some problems
 May be a large number of frequent item
sets
 May be a huge number of association rules

 One potential solution is to look at


closed item sets only
Data Mining and Knowledge
Discovery 75
Frequent Closed Item Sets
 An item set X is a closed item set if there is
no item set X’ such that X  X’ and every
transaction containing X also contains X’

 A rule X Y is an association rule on a


frequent closed item set if
 both X and XY are frequent closed item sets, and
 there does not exist a frequent closed item set Z
such that X  Z  XY
Data Mining and Knowledge
Discovery 76
Example
ID Items
10 A,C,D,E,F Frequent Item Sets (min support = 2):
20 A,B,E
30 C,E,F
A (3),
40 A,C,D,F E (4),
50 C,E,F AE (2),
ACDF (2), All the closed sets
CF (3),
CEF (3),
D (2),
AC (2), Not closed! Why?
+ 12 more

Data Mining and Knowledge


Discovery 77
Mining Frequent Closed Item
Sets (CLOSET) TDB
NOTE CEFAD
C:4 EA
E:4 CEF
F:4 CFAD
A:3 Order for CEF
D:2 conditional DB

D-cond DB (D:2) A-cond DB (A:3) F-cond DB (F:4) E-cond DB (E:4)


CEFA CEF CE:3 C:4
CFA E C
CF Output: E:4
Output: CFAD:2 Output: CF:2,CEF:3
Output: A:3
EA-cond DB (EA:2)
C

Output: EA:2
Data Mining and Knowledge
Discovery 78
Mining with Taxonomies
Taxonomy:
Clothes Footwear

Outerwear Shirts Shoes Hiking Boots

Jackets Ski Pants

Generalized association rule


X Y where no item in Y is
an ancestor of an item in X

Data Mining and Knowledge


Discovery 79
Why Taxonomy?
 The ‘classic’ association rule mining restricts
the rules to the leave nodes in the taxonomy
 However:
 Rules at lower levels may not have minimum
support and thus interesting association may go
undiscovered
 Taxonomies can be used to prune uninteresting
and redundant rules
Data Mining and Knowledge
Discovery 80
Example
Item Set Support
ID Items {Jacket} 2
10 Shirt {Outerwear} 3
20 Jacket, Hiking Boots {Cloths} 4
30 Ski pants, Hiking Boots {Shoes} 2
40 Shoes {Hiking Boots} 2
50 Shoes {Footwear} 2
60 Jacket {Outerwear, Hiking Boots} 2
{Cloths, Hiking Boots} 2
{Outerwear, Footwear} 2
{Cloths, Footwear} 2

Rule Support Confidence


Outerwear  Hiking Boots 2 2/3
Outerwear  Footwear 2 2/3
Hiking Boots  Outerwear 2 2/2
Hiking Boots  Clothes 2 2/2

Data Mining and Knowledge


Discovery 81
Interesting Rules
 Many way in which the interestingness of a rule can be
evaluated based on ancestors
 For example:
 A rule with no ancestors is interesting
 A rule with ancestor(s) is interesting only if it has enough
‘relative support’

Rule ID Rule Support Item Support


1 Clothes  Footwear 10 Clothes 5
2 Outerwear  Footwear 8 Outerwear 2
3 Jackets  Footwear 4 Jackets 1

 Which rules are interesting?


Data Mining and Knowledge
Discovery 82
Discussion
 Association rule mining finds expression of
the form X Y from large data sets
 One of the most popular data mining tasks
 Originates in market basket analysis
 Key measures of performance
 Support
 Confidence (or accuracy)
 Is support and confidence enough?
Data Mining and Knowledge
Discovery 83
Type of Rules Discovered
 ‘Classic’ association rule problem
 All rules satisfying minimum threshold of
support and confidence

 Focus on subset of rules, e.g.,


 Optimized rules
What makes for an
 Maximal frequent item sets interesting rule?
 Closed item sets

Data Mining and Knowledge


Discovery 84
Algorithm Construction
 Determine frequent item sets (all or
part)
 By far the most computational time
 Variations focus on this part

 Generate rules from frequent item sets

Data Mining and Knowledge


Discovery 85
Generating Item Sets
Search space Bottom-up Top-down
traversed

Support
determined Counting Intersecting Counting Intersecting
Apriori* Partition FP-Growth* Eclat
Apriori-like AprioriTID
algorithms DIC
No algorithm
dominates others!
* Have discussed
Data Mining and Knowledge
Discovery 86
Applications
 Market basket analysis
 Classic marketing application

 Applications to recommender systems

Data Mining and Knowledge


Discovery 87
Recommender
 Customized goods and services
 Recommend products
 Collaborative filtering
 similarities among users’ tastes
 recommend based on other users
 many on-line systems
 simple algorithms

Data Mining and Knowledge


Discovery 88
Classification Approach
 View as classification problem
 Product either of interest or not
 Induce a model, e.g., a decision tree
 Classify a new product as either interesting
or not interesting
 Difficulty in this approach?

Data Mining and Knowledge


Discovery 89
Association Rule Approach
 Product associations
 90% of users who like product A and product B also
like product C
A and B  C (90%)
 User associations
 90% of products liked by user A and user B are also
liked by user C
 Use combination of product and user
associations
Data Mining and Knowledge
Discovery 90
Advantages
 ‘Classic’ collaborative filtering must identify
users with similar tastes
 This approach uses overlap of other users’
tastes to match given user’s taste
 Can be applied to users whose tastes don’t
correlate strongly with those of other users
 Can take advantage of information from, say user
A, for a recommendation to user B, even if they do
not correlate

Data Mining and Knowledge


Discovery 91
What’s Different Here?
 Is this really a ‘classic’ association rule
problem?
 Want to learn what products are liked by
what users
 ‘Semi-supervised’
 Target item
 User (for user associations)
 Product (for product associations)
Data Mining and Knowledge
Discovery 92
Single-Consequent Rules
 Only a single (target) item in the
consequent
 Go through all such items
Association Rules Associations for
All possible item Recommender
combination consequent

Classification
One single item
consequent
Data Mining and Knowledge
Discovery 93

You might also like