Unsupervised Learning

Unsupervised Learning
 Clustering
 Unsupervised classification, that is, without
the class attribute
 Want to discover the classes
 Association Rule Discovery

 Discover correlation
Data Mining and Knowledge

Discovery 1
The Clustering Process
 Pattern representation
 Definition of pattern proximity measure
 Clustering
 Data abstraction
 Cluster validation
Discovery 2
Pattern Representation
 Number of classes
 Number of available patterns
 Circles, ellipses, squares, etc.
 Feature selection
 Can we use wrappers and filters?
 Feature extraction
 Produce new features
 E.g., principle component analysis (PCA)
Discovery 3
Pattern Proximity
 Want clusters of instances that are similar
to each other but dissimilar to others
 Need a similarity measure
 Continuous case
 Euclidean measure (compact isolated clusters)
 The squared Mahalanobis distance
1
d M (xi , x j )  (xi  x j ) (xi  x j )
T
alleviates problems with correlation

 Many more measures
Discovery 4
Pattern Proximity
 Nominal attributes
nx
d (xi , x j ) 
n
n  Number of attributes
x  Number of attributes that are the same

Discovery 5
Clustering Techniques
Clustering
Hierarchical Partitional
Single Complete Square Mixture

Link Link Error Maximization
CobWeb K-means Expectation

Maximization
Discovery 6
Technique Characteristics
 Agglomerative vs Divisive
 Agglomerative: each instance is its own cluster
and the algorithm merges clusters
 Divisive: begins with all instances in one cluster
and divides it up
 Hard vs Fuzzy
 Hard clustering assigns each instance to one
cluster whereas in fuzzy clustering assigns degree
of membership
Discovery 7
More Characteristics
 Monothetic vs Polythetic
 Polythetic: all attributes are used simultaneously, e.g., to
calculate distance (most algorithms)
 Monothetic: attributes are considered one at a time
 Incremental vs Non-Incremental
 With large data sets it may be necessary to consider only
part of the data at a time (data mining)
 Incremental works instance by instance

Discovery 8
Hierarchical Clustering
Dendrogram
S
F G i
m
i
l
a
r
i
C DE t
B y
A
A B C D E F G

Discovery 9
Hierarchical Algorithms
 Single-link
 Distance between two clusters set equal to the
minimum of distances between all instances
 More versatile
 Produces (sometimes too) elongated clusters
 Complete-link
 Distance between two clusters set equal to maximum
of all distances between instances in the clusters
 Tightly bound, compact clusters
 Often more useful in practice
Discovery 10
Example: Clusters Found
2
Single-Link 1 1 2 2 2 2
1 1 1 2 2
1 * * * * * * * * 2* 2
1 1 2
1 11 2
2 2 2
2
1 1 2 2 2 2
1 1 1 2
Complete-Link 1
1
* * * * * * * * 2* 2 2
2
1
1 11 2
2 2 2

Discovery 11
Partitional Clustering
 Output a single partition of the data
into clusters
 Good for large data sets
 Determining the number of clusters is a
major challenge

Discovery 12
K-Means
Predetermined
number of clusters
Start with seed

clusters of one
element
Seeds
Discovery 13
Assign Instances to Clusters

Discovery 14
Find New Centroids

Discovery 15
New Clusters

Discovery 16
Discussion: k-means
 Applicable to fairly large data sets
 Sensitive to initial centers
 Use other heuristics to find good initial
centers
 Converges to a local optimum
 Specifying the number of centers very
subjective

Discovery 17
Clustering in Weka
 Clustering algorithms in Weka
 K-Means
 Expectation Maximization (EM)
 Cobweb
 hierarchical, incremental, and

agglomerative

Discovery 18
CobWeb
 Algorithm (main) characteristics:
 Hierarchical and incremental
 Uses category utility
Improvemen t in probability estimate
The k clusters because of instancecluster assigment

Pr C 
 l  i ij l
Pr a  v | C 
2
 Pr ai  vij
2
  
CU C1 , C2 ,..., Ck  
l i j
All possible values

 Why divide by k? for attribute ai

Discovery 19
Category Utility
 If each instance in its own cluster
1 vij  actual value of instance
 
Pr ai  vij | Cl  
0 otherwise
 Category utility function becomes
n   Pr ai  vij  
2
CU C1 , C2 ,..., Ck  
i j
k
 Without k it would always be best for each
instance to have its own cluster, overfitting!
Discovery 20
The Weather Problem
Outlook Temp. Humidity Windy Play
Sunny Hot High FALSE No
Sunny Hot High TRUE No
Overcast Hot High FALSE Yes
Rainy Mild High FALSE Yes
Rainy Cool Normal FALSE Yes
Rainy Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Sunny Mild High FALSE No
Sunny Cool Normal FALSE Yes
Rainy Mild Normal FALSE Yes
Sunny Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Rainy Mild High TRUE No
Discovery 21
Weather Data (without Play)
 Label instances: a,b,….,n
Start by putting Add another instance
the first instance in its own cluster
in its own cluster
a a b

Discovery 22
Adding the Third Instance
Evaluate the category utility of adding the instance to one
of the two clusters versus adding it as its own cluster
b a a c b
a c b c
Highest utility

Discovery 23
Adding Instance f
First instance not to get
its own cluster:
a b c d
e f
Look at the instances:
Rainy Cool Normal FALSE
Rainy Cool Normal TRUE
Quite similar!

Discovery 24
Add Instance g
Look at the instances:
E) Rainy Cool Normal FALSE
F) Rainy Cool Normal TRUE
G) Overcast Cool Normal TRUE
a b c d
e f g

Discovery 25
Add Instance h
Look at the instances: Runner up
A) Sunny Hot High FALSE

D) Rainy Mild High FALSE
H) Sunny Mild High FALSE
Rearrange: Best matching node
Merged into a
single cluster
before h is added
b c
a d h e f g
(Splitting is also possible)

Discovery 26
Final Hierarchy
g f j m n
a d h c l e i
b k What next?

Discovery 27
Dendrogram  Clusters
g f j m n
a d h c l e i
b k What do a, b, c, d, h, k, and l
have in common?

Discovery 28
Numerical Attributes
 Assume normal distribution
 1 1
l PrCl  2  i     
1
CU C1 , C2 ,..., Ck    il i 
k
 Problems with zero variance!
 The acuity parameter imposes a minimum
variance

Discovery 29
Hierarchy Size (Scalability)
 May create very large hierarchy
 The cutoff parameter is uses to suppress
growth
If
CU C1 , C2 ,..., Ck   Cutoff

cut node off.

Discovery 30
Discussion
 Advantages
 Incremental  scales to large number of instances
 Cutoff  limits size of hierarchy
 Handles mixed attributes
 Disadvantages
 Incremental  sensitive to order of instances?
 Arbitrary choice of parameters:
 divide by k,
 artificial minimum value for variance of numeric attributes,
 ad hoc cutoff value
Discovery 31
Probabilistic Perspective
 Most likely set of clusters given data
 Probability of each instance belonging to a
cluster
 Assumption: instances are drawn from one of
several distributions
 Goal: estimate the parameters of these
distributions
 Usually: assume distributions are normal

Discovery 32
Mixture Resolution
 Mixture: set of k probability distributions
 Represent the k clusters
 Probabilities that an instance takes certain
attribute values given it is in the cluster
 What is the probability an instance belongs to

a cluster (or a distribution)

Discovery 33
One Numeric Attribute
Two cluster mixture model:
Cluster B
Cluster A
Attribute
Given some data, how can you determine the parameters:
 A  Mean for Cluster A
 A  Standard deviation for Cluster A
 B  Mean for Cluster B
 B  Standard deviation for Cluster B
p A  Probabilit y of being in Cluster A
Discovery 34
Problems
 If we knew which instance came from each
cluster we could estimate these values
 If we knew the parameters we could calculate
the probability that an instance belongs to
each cluster
Prx | A Pr[ A] f ( x;  A ,  A ) p A
PrA | x   
Pr[ x] Pr[ x]
( x )2
1 
f ( x;  A ,  A )  e 2 2
.
2
Discovery 35
EM Algorithm
 Expectation Maximization (EM)
 Start with initial values for the parameters
 Calculate the cluster probabilities for each instance
 Re-estimate the values for the parameters
 Repeat
 General purpose maximum likelihood
estimate algorithm for missing data
 Can also be used to train Bayesian networks
(later)
Discovery 36
Beyond Normal Models
 More than one class:
 Straightforward
 More than one numeric attribute
 Easy if assume attributes independent
 If dependent attributes, treat them jointly
using the bivariate normal
 Nominal attributes
 No more normal distribution!

Discovery 37
EM using Weka
 Options
 numClusters: set number of clusters.
 Default = -1 selects it automatically
 maxIterations: maximum number of iterations
 seed -- random number seed
 minStdDev -- set minimum allowable standard
deviation

Discovery 38
Other Clustering
 Artificial Neural Networks (ANN)
 Random search
 Genetic Algorithms (GA)
 GA used to find initial centroids for k-means
 Simulated Annealing (SA)
 Tabu Search (TS)
 Support Vector Machines (SVM)
 Will discuss GA and SVM later
Discovery 39
Applications
 Image segmentation
 Object and Character Recognition
 Data Mining:
 Stand-alone to gain insight into the data
 Preprocess before classification that
operates on the detected clusters
Discovery 40
DM Clustering Challenges
 Data mining deals with large databases
 Scalability with respect to number of instance
 Use a random sample (possible bias)
 Dealing with mixed data
 Many algorithms only make sense for numeric
data
 High dimensional problems
 Can the algorithm handle many attributes?
 How do we interpret a cluster in high dimensions?

Discovery 41
Other (General) Challenges
 Shape of clusters
 Minimum domain knowledge (e.g.,
knowing the number of clusters)
 Noisy data
 Insensitivity to instance order
 Interpretability and usability

Discovery 42
Clustering for DM
 Main issue is scalability to large databases
 Many algorithms have been developed for
scalable clustering:
 Partitional methods: CLARA, CLARANS
 Hierarchical methods: AGNES, DIANA, BIRCH,
CURE, Chameleon

Discovery 43
Practical Partitional Clustering
Algorithms
 Classic k-Means (1967)
 Work from 1990 and later:
 k-Medoids
 Uses the mediod instead of the centroid
 Less sensitive to outliers and noise
 Computations more costly
 PAM (Partitioning Around Mediods)
algorithm

Discovery 44
Large-Scale Problems
 CLARA: Clustering LARge Applications
 Select several random samples of instances
 Apply PAM to each
 Return the best clusters
 CLARANS:
 Similar to CLARA
 Draws samples randomly while searching
 More effective than PAM and CLARA
Discovery 45
Hierarchical Methods
 BIRCH: Balanced Iterative Reducing and
Clustering using Hierarchies
 Clustering feature: triplet summarizing
information about subclusters
 Clustering feature tree: height-balanced

tree that stores the clustering features

Discovery 46
BIRCH Mechanism
 Phase I:
 Scan database to build an initial CF tree
 Multilevel compression of the data
 Phase II:
 Apply a selected clustering algorithm to the
leaf nodes of the CF tree
 Has been found to be very scalable

Discovery 47
Conclusion
 The use of clustering in data mining
practice seems to be somewhat limited
due to scalability problems
 More commonly used unsupervised
learning:
Association Rule Discovery

Discovery 48
 Aims to discovery interesting correlation or
other relationships in large databases
 Finds a rule of the form
if A and B then C and D
 Which attributes will be included in the
relation is unknown

Discovery 49
Mining Association Rules
 Similar to classification rules

 Use same procedure?
 Every attribute is the same
 Apply to every possible expression on right
hand side
 Huge number of rules  Infeasible
 Only want rules with high coverage/support

Discovery 50
Market Basket Analysis
 Basket data: items purchased on per-

transaction basis (not cumulative, etc)
 How do you boost the sales of a given product?
 What other products does discontinuing a product
impact?
 Which products should be shelved together?
 Terminology (market basket analysis):
 Item - an attribute/value pair
 Item set - combination of items with min. coverage

Discovery 51
How Many k-Item Sets Have
Minimum Coverage?
Outlook Temp. Humidity Windy Play
Sunny Hot High FALSE No
Sunny Hot High TRUE No
Overcast Hot High FALSE Yes
Rainy Mild High FALSE Yes
Rainy Cool Normal FALSE Yes
Rainy Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Sunny Mild High FALSE No
Sunny Cool Normal FALSE Yes
Rainy Mild Normal FALSE Yes
Sunny Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Rainy Mild High TRUE No
Discovery 52
Item Sets
1-Item 2-Item 3-Item 4-Item
Outlook=sunny Outlook=sunny Outlook=sunny Outlook=sunny

(5) temp=mild (2) temp=hot temp=hot
humidity=high humidity=high
(2) play=no (2)
Outlook= Outlook=sunny Outlook=sunny Outlook=sunny
overcast (4) temp=hot (2) temp=hot humidity=high
play=no (2) windy=false
play=no (2)
Outlook=rainy Outlook=sunny Outlook=sunny Outlook=over
(5) humidity=norm humidity=norm temp=hot
(2) play=yes (2) windy=false
play=no (2)
Temp=cool (4) Outlook=sunny Outlook=sunny Outlook=rainy
windy=true (2) humidity=high temp=mild
windy=false (2) windy=false
play=yes (2)
Temp=mild (6) Outlook=sunny Outlook=sunny Outlook=rainy
windy=true (2) humidity=high humidity=norm
play=no (3) windy=false
play=yes (2)
Discovery 53
From Sets to Rules
3-Item Set w/coverage 4:
Humidity = normal, windy = false, play = yes
Association Rules: Accuracy
If humidity = normal and windy = false then play = yes 4/4

If humidity = normal and play = yes then windy = false 4/6
If windy = false and play = yes then humidity = normal 4/6
If humidity = normal then windy = false and play = yes 4/7
If windy = false then humidity = normal and play = yes 4/8
If play = yes then humidity = normal and windy = false 4/9
If - then humidity = normal and windy = false and play=yes 4/12

Discovery 54
From Sets to Rules
(continued)
4-Item Set w/coverage 2:
Temperature = cool, humidity = normal,
windy = false, play = yes
Association Rules: Accuracy
If temperature = cool, windy = false  humidity = normal, play = yes 2/2

If temperature = cool, humidity = normal, windy = false  play = yes 2/2
If temperature = cool, windy = false, play = yes  humidity = normal 2/2

Discovery 55
Overall
 Minimum coverage (2):

 12 1-item sets, 47 2-item sets, 39 3-item sets, 6 4-item
sets
 Minimum accuracy (100%):

 58 association rules
“Best” Rules (Coverage = 4, Accuracy = 100%)
If humidity = normal and windy = false  play = yes

If temperature = cool  humidity = normal
If outlook = overcast  play = yes
Discovery 56
Association Rule Mining
STEP 1: Find all item sets that meet

minimum coverage
STEP 2: Find all rules that meet minimum

accuracy
STEP 3: Prune
Discovery 57
Generating Item Sets
 How do we generate minimum coverage item
sets in a scalable manner?
 Total number of item set is huge
 Grows exponentially in the number of attributes
 Need an efficient algorithm:
 Start by generating minimum coverage 1-item sets
 Use those to generate 2-item sets, etc
 Why do we only need to consider minimum
coverage 1-item sets?

Discovery 58
Justification
Item Set 1: {Humidity = high}
Coverage(1) = Number of times humidity is high
Item Set 2: {Windy = false}

Coverage (2) = Number of times windy is false
Item Set 3: {Humidity = high, Windy = false}

Coverage (3) = Number of times humidity is high and
windy is false
Coverage (3)  Coverage(1) If Item Set 1 and 2 do not

Coverage (3)  Coverage(2) both meet min. coverage
Item Set 3 cannot either
Discovery 59
Start with all (A B C)
3-item sets (A B D)
There are only two 4-
that meet min. (A C D)
item sets that could
coverage (A C E)
possibly work
Merge to (Consider only
generate sets that start
4-item sets with the same
two attributes)
(A B C D) Candidate 4-item sets with minimum

(A C D E) coverage (must be checked)

Discovery 60
Algorithm for Generating Item
Sets
 Build up from 1-item sets so that we
only consider item sets that is found by
merging two minimum coverage sets
 Only consider sets that have all but one
item in common
 Computational efficiency further
improved using hash tables

Discovery 61
Generating Rules
Meets min. If windy = false and play = no then
coverage
and accuracy
outlook = sunny and humidity = high

If windy = false and play = no
Meets min. then outlook = sunny
coverage
If windy = false and play = no
and accuracy
then humidity = high

Discovery 62
How Many Rules?
 Want to consider every possible subset
of attributes as consequent
 Have 4 attributes:
 Four single consequent rules
 Six double consequent rules
 Two triple consequent rules
 Twelve possible rules for single 4-item set!
 Exponential explosion of possible rules
Discovery 63
Must We Check All?
If A and B then C and D
Coverage  Number of times A, B, C, and D are true
Number of times A, B, C, and D are true
Accuracy 
Number of times A and B are true
If A,B and C then D
Coverage  Number of times A, B, C, and D are true
Number of times A, B, C, and D are true
Accuracy 
Number of times A, B, and C are true

Discovery 64
Efficiency Improvement
 A double consequent rule can only be OK if
both single consequent rules are OK
 Procedure:
 Start with single consequent rules
 Build up double consequent rules, etc.
 candidate rules
 check for accuracy
 In practice: need to check far fewer rules

Discovery 65
Apriori Algorithm
 This is a simplified description of the
Apriori algorithm
 Developed in early 90s and is the most
commonly used approach
 New developments focus on
 Generating item sets more efficiently
 Generating rules from item sets more
efficiently

Discovery 66
using Weka
 Parameters to be specified in Apriori:
 upperBoundMinSupport: start with this value
of minimum support
 delta: in each step decrease the minimum
support required by this value
 lowerBoundMinSupport: final minimum
support
 numRules: how many rules are generated
 metricType: confidence, lift, leverage, conviction
 minMetric: smallest acceptable value for a rule
 Handles only nominal attributes
Discovery 67
Difficulties
 Apriori algorithm improves performance
by using candidate item sets
 Still some problems …
 Costly to generate large number of item sets
 To generate a frequent pattern of size 100 need
>21001030 candidates!
 Requires repeated scans of database to check
candidates
 Again, most problematic for long patterns
Discovery 68
Solution?
 Can candidate generation be avoided?
 New approach:
 Create a frequent pattern tree (FP-tree)
 stores information on frequent patterns
 Use the FP-tree for mining frequent
patterns
 partitioning-based
 divide-and-conquer
 (as opposed to bottom-up generation)
Discovery 69
Database  FP-Tree
TID Items Frequent Items
100 F,A,C,D,G,I,M,P F,C,A,M,P
200 A,B,C,F,L,M,O F,C,A,B,M
300 B,F,H,J,O F,B
400 B,C,K,S,P C,B,P Root
500 A,F,C,E,L,P,M,N F,C,A,M,P
Head of
Item node links F:4 C:1
F
C:3 B:1 B:1
C
A A:3 P:1
B
(Min. support = 3) M:2 B:1
M
P
P:2 M:1
Discovery 70
Computational Effort
 Each node has three fields
 item name
 count
 node link
 Also a header table with
 item name
 head of node link
 Need two scans of the database
 Collect set of frequent items
 Construct the FP-tree
Discovery 71
Comments
 The FP-tree is a compact data structure
 The FP-tree contains all the information
related to mining frequent patterns (given the
support)
 The size of the tree is bounded by the
occurrences of frequent items
 The height of the tree is bounded by the
maximum number of items in a transaction
Discovery 72
Mining Patterns
 Mine complete set of frequent patterns
 For any frequent item A, all possible

patterns containing A can be obtained
by following A’s node links starting from
A’s head of node links

Discovery 73
Example Root
Head of
Item node links
F:4 C:1
F
C:3 B:1 B:1
C
A A:3 P:1
B
M M:2 B:1 Frequent Pattern
P (P:3)
P:2 M:1
Paths
<F:4, C:3, A:3, M:2, P:2>
Occurs twice <C:1, B:1, P:1>
Occurs ones
Discovery 74
Rule Generation
 Mining complete set of association rules
has some problems
 May be a large number of frequent item
sets
 May be a huge number of association rules
 One potential solution is to look at

closed item sets only
Discovery 75
Frequent Closed Item Sets
 An item set X is a closed item set if there is
no item set X’ such that X  X’ and every
transaction containing X also contains X’
 A rule X Y is an association rule on a

frequent closed item set if
 both X and XY are frequent closed item sets, and
 there does not exist a frequent closed item set Z
such that X  Z  XY
Discovery 76
Example
ID Items
10 A,C,D,E,F Frequent Item Sets (min support = 2):
20 A,B,E
30 C,E,F
A (3),
40 A,C,D,F E (4),
50 C,E,F AE (2),
ACDF (2), All the closed sets
CF (3),
CEF (3),
D (2),
AC (2), Not closed! Why?
+ 12 more

Discovery 77
Mining Frequent Closed Item
Sets (CLOSET) TDB
NOTE CEFAD
C:4 EA
E:4 CEF
F:4 CFAD
A:3 Order for CEF
D:2 conditional DB
D-cond DB (D:2) A-cond DB (A:3) F-cond DB (F:4) E-cond DB (E:4)

CEFA CEF CE:3 C:4
CFA E C
CF Output: E:4
Output: CFAD:2 Output: CF:2,CEF:3
Output: A:3
EA-cond DB (EA:2)
C
Output: EA:2
Discovery 78
Mining with Taxonomies
Taxonomy:
Clothes Footwear
Outerwear Shirts Shoes Hiking Boots
Jackets Ski Pants
Generalized association rule

X Y where no item in Y is
an ancestor of an item in X

Discovery 79
Why Taxonomy?
 The ‘classic’ association rule mining restricts
the rules to the leave nodes in the taxonomy
 However:
 Rules at lower levels may not have minimum
support and thus interesting association may go
undiscovered
 Taxonomies can be used to prune uninteresting
and redundant rules
Discovery 80
Example
Item Set Support
ID Items {Jacket} 2
10 Shirt {Outerwear} 3
20 Jacket, Hiking Boots {Cloths} 4
30 Ski pants, Hiking Boots {Shoes} 2
40 Shoes {Hiking Boots} 2
50 Shoes {Footwear} 2
60 Jacket {Outerwear, Hiking Boots} 2
{Cloths, Hiking Boots} 2
{Outerwear, Footwear} 2
{Cloths, Footwear} 2
Rule Support Confidence

Outerwear  Hiking Boots 2 2/3
Outerwear  Footwear 2 2/3
Hiking Boots  Outerwear 2 2/2
Hiking Boots  Clothes 2 2/2

Discovery 81
Interesting Rules
 Many way in which the interestingness of a rule can be
evaluated based on ancestors
 For example:
 A rule with no ancestors is interesting
 A rule with ancestor(s) is interesting only if it has enough
‘relative support’
Rule ID Rule Support Item Support

1 Clothes  Footwear 10 Clothes 5
2 Outerwear  Footwear 8 Outerwear 2
3 Jackets  Footwear 4 Jackets 1
 Which rules are interesting?

Discovery 82
Discussion
 Association rule mining finds expression of
the form X Y from large data sets
 One of the most popular data mining tasks
 Originates in market basket analysis
 Key measures of performance
 Support
 Confidence (or accuracy)
 Is support and confidence enough?
Discovery 83
Type of Rules Discovered
 ‘Classic’ association rule problem
 All rules satisfying minimum threshold of
support and confidence
 Focus on subset of rules, e.g.,

 Optimized rules
What makes for an
 Maximal frequent item sets interesting rule?
 Closed item sets

Discovery 84
Algorithm Construction
 Determine frequent item sets (all or
part)
 By far the most computational time
 Variations focus on this part
 Generate rules from frequent item sets

Discovery 85
Search space Bottom-up Top-down
traversed
Support
determined Counting Intersecting Counting Intersecting
Apriori* Partition FP-Growth* Eclat
Apriori-like AprioriTID
algorithms DIC
No algorithm
dominates others!
* Have discussed
Discovery 86
Applications
 Market basket analysis
 Classic marketing application
 Applications to recommender systems

Discovery 87
Recommender
 Customized goods and services
 Recommend products
 Collaborative filtering
 similarities among users’ tastes
 recommend based on other users
 many on-line systems
 simple algorithms

Discovery 88
Classification Approach
 View as classification problem
 Product either of interest or not
 Induce a model, e.g., a decision tree
 Classify a new product as either interesting
or not interesting
 Difficulty in this approach?

Discovery 89
Association Rule Approach
 Product associations
 90% of users who like product A and product B also
like product C
A and B  C (90%)
 User associations
 90% of products liked by user A and user B are also
liked by user C
 Use combination of product and user
associations
Discovery 90
Advantages
 ‘Classic’ collaborative filtering must identify
users with similar tastes
 This approach uses overlap of other users’
tastes to match given user’s taste
 Can be applied to users whose tastes don’t
correlate strongly with those of other users
 Can take advantage of information from, say user
A, for a recommendation to user B, even if they do
not correlate

Discovery 91
What’s Different Here?
 Is this really a ‘classic’ association rule
problem?
 Want to learn what products are liked by
what users
 ‘Semi-supervised’
 Target item
 User (for user associations)
 Product (for product associations)
Discovery 92
Single-Consequent Rules
 Only a single (target) item in the
consequent
 Go through all such items
Association Rules Associations for
All possible item Recommender
combination consequent
Classification
One single item
consequent
Discovery 93

Unsupervised Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unsupervised Learning

Uploaded by

Copyright:

Available Formats

Unsupervised Learning

 Association Rule Discovery

Data Mining and Knowledge

alleviates problems with correlation

Data Mining and Knowledge

Single Complete Square Mixture

CobWeb K-means Expectation

Data Mining and Knowledge

Data Mining and Knowledge

Data Mining and Knowledge

Data Mining and Knowledge

Start with seed

Data Mining and Knowledge

Data Mining and Knowledge

Data Mining and Knowledge

Data Mining and Knowledge

 hierarchical, incremental, and

Data Mining and Knowledge

All possible values

Data Mining and Knowledge

Data Mining and Knowledge

Data Mining and Knowledge

Data Mining and Knowledge

Data Mining and Knowledge

A) Sunny Hot High FALSE

(Splitting is also possible)

Data Mining and Knowledge

Data Mining and Knowledge

Data Mining and Knowledge

cut node off.

Data Mining and Knowledge

Data Mining and Knowledge

 What is the probability an instance belongs to

Data Mining and Knowledge

Data Mining and Knowledge

Data Mining and Knowledge

Data Mining and Knowledge

Data Mining and Knowledge

Data Mining and Knowledge

Data Mining and Knowledge

 Clustering feature tree: height-balanced

Data Mining and Knowledge

Data Mining and Knowledge

Association Rule Discovery

Data Mining and Knowledge

 Similar to classification rules

 Only want rules with high coverage/support

Data Mining and Knowledge

 Basket data: items purchased on per-

Data Mining and Knowledge

Outlook=sunny Outlook=sunny Outlook=sunny Outlook=sunny

Association Rules: Accuracy

If humidity = normal and windy = false then play = yes 4/4

Data Mining and Knowledge

Association Rules: Accuracy

If temperature = cool, windy = false  humidity = normal, play = yes 2/2

Data Mining and Knowledge

 Minimum coverage (2):

 Minimum accuracy (100%):

If humidity = normal and windy = false  play = yes

STEP 1: Find all item sets that meet

STEP 2: Find all rules that meet minimum

Data Mining and Knowledge