Professional Documents
Culture Documents
Roadmap
Unsupervised learning
Clustering categories
Clustering algorithms
K-means
Fuzzy c-means
Kernel-based
Graph-based
Q&A
Unsupervised learning
Definition 1
Supervised: human effort involved
Unsupervised: no human effort
Definition 2
Supervised: learning conditional distribution
Back
Clustering
What is clustering?
Clustering
Definition
Assignment of a set of observations into
Clustering
Hard vs. Soft
Hard: same object can only belong to single
cluster
Soft: same object can belong to different
clusters
Clustering
Hard vs. Soft
Hard: same object can only belong to single
cluster
Soft: same object can belong to different
clusters
E.g. Gaussian mixture model
Clustering
Flat vs. Hierarchical
Flat: clusters are flat
Hierarchical: clusters form a tree
Agglomerative
Divisive
Hierarchical clustering
Agglomerative (Bottom-up)
Compute all pair-wise pattern-pattern
similarity coefficients
Place each of n patterns into a class of its
own
Merge the two most similar clusters into one
Replace the two clusters into the new cluster
Re-compute inter-cluster similarity scores w.r.t. the
new cluster
Hierarchical clustering
Agglomerative (Bottom up)
Hierarchical clustering
Agglomerative (Bottom up)
1st iteration
1
Hierarchical clustering
Agglomerative (Bottom up)
2nd iteration
1
Hierarchical clustering
Agglomerative (Bottom up)
3rd iteration
3
Hierarchical clustering
Agglomerative (Bottom up)
4th iteration
3
Hierarchical clustering
Agglomerative (Bottom up)
5th iteration
3
Hierarchical clustering
Agglomerative (Bottom up)
Finally k clusters left
6
2
8
7
9
5
Hierarchical clustering
Divisive (Top-down)
Start at the top with all patterns in one
cluster
The cluster is split using a flat clustering
algorithm
This procedure is applied recursively until
each pattern is in its own singleton cluster
Hierarchical clustering
Divisive (Top-down)
subroutine
Which one is more efficient?
Which one is more accurate?
X x1 , x2 ,L , xn
Data set:
C1 , C2 ,L Ck
Clusters:
V
Codebook
: v1 , v2 ,L , vk
ij
Partition matrix:
K-means
Minimizes functional:
k
E , V ij x j vi
1
0
ij
2
if x j Ci
otherwise
i 1 j 1
Iterative algorithm:
Initialize the codebook V with vectors
K-means
Disadvantages
Dependent on initialization
K-means
Disadvantages
Dependent on initialization
K-means
Disadvantages
Dependent on initialization
K-means
Disadvantages
Dependent on initialization
Select random seeds with at least Dmin
Or, run the algorithm many times
K-means
Disadvantages
Dependent on initialization
Sensitive to outliers
K-means
Disadvantages
Dependent on initialization
Sensitive to outliers
Use K-medoids
K-means
Disadvantages
Dependent on initialization
Sensitive to outliers (K-medoids)
Can deal only with clusters with spherical
K-means
Disadvantages
Dependent on initialization
Sensitive to outliers (K-medoids)
Can deal only with clusters with spherical
Deciding K
Try a couple of K
Deciding K
When k = 1, the objective function is 873.0
Deciding K
When k = 2, the objective function is 173.1
Deciding K
When k = 3, the objective function is 133.6
Deciding K
We can plot objective function values for
k=1 to 6
The abrupt change at k=2 is highly
suggestive of two clusters
knee finding or elbow finding
Note that the results are not always as
clear cut as in this toy example
Back
Fuzzy C-means
Soft clustering
Minimize functional
k
X x1 , x2 ,L , xn
Data set:
C1 , C2 ,L Ck
Clusters:
V
Codebook
: v1 , v2 ,L , vk 1
ij ij
Partition matrix:
K-means:E , V
E U , V uij
i 1 j 1
U uij
u
set to 2
x j vi
i 1 j 1
ij
x j vi
uij
0,1
fuzzy partition
matrix
k n
m 1,
if x j Ci
otherwise
i 1
ij
j 1,K , n
Fuzzy C-means
Minimize
E U ,V uij
subject to
i 1 j 1
u
i 1
ij
x j vi
j 1,K , n
Fuzzy C-means
Minimize
E U , V uij
subject to
i 1 j 1
u
i 1
ij
x j vi
j 1,K , n
problem?
Fuzzy C-means
Minimize
E U ,V uij
subject to
i 1 j 1
u
i 1
ij
x j vi
j 1,K , n
problem?
Introduce Lagrangian
k
n
multipliers
m
2
k
L j U ,V =
u
i 1 j 1
ij
x j vi
j uij 1
i 1
Fuzzy c-means
Introduce Lagrangian multipliers
L j U ,V =
u
i 1 j 1
ij
x j vi
j uij 1
i 1
Iterative optimization
x j vi
v
l 1
j
l
uij x j
m
j 1
n
u
j 1
ij
m 1
Application to image
segmentation
Original images
Segmentation
s
Homogenous
intensity
corrupted by 5%
Gaussian noise
Accuracy =
96.02%
Sinusoidal
inhomogenous
intensity
corrupted by 5%
Gaussian noise
Image: Dao-Qiang Zhang, Song-Can
Accuracy =
94.41%
Back
x j x j x j vi vi x j vi vi
T
K x j , x j 2 K x j , vi K vi , vi
Kernel K-means
k
E , V ij x j vi
i 1 j 1
E U , V uij
i 1 j 1
x j vi
E U , V uij
x j vi
i 1 j 1
E U , V 2 uij
i 1 j 1
1 K x ,v
j
E U , V uij
i 1 j 1
1 K x j , vi
Nj
u 1 u
i 1 j 1
ij
xr N j
ir
E U , V uij
i 1 j 1
1 K x j , vi
Nj
uij
i 1 j 1
xr N j
1 uir
window
around N j
Nj
: the cardinality of
controls the effect of the penalty term
The penalty term is minimized when
Membership value for xj is large and also
0.9pixels
0.9 0.9
large at neighboring
0.9 0.9 0.9
Vice versa
0.9 0.9 0.9
FCM
Accuracy =
96.02%
KFCM
Accuracy =
96.51%
Homogenous
intensity corrupted
by 5% Gaussian
noise
SFCM
Accuracy =
Image: Dao-Qiang Zhang, Song-Can
99.34%
SKFCM
Accuracy =
100.00%
FCM
Accuracy =
94.41%
KFCM
Accuracy =
91.11%
Sinusoidal
inhomogenous
intensity corrupted
by 5% Gaussian
noise
SFCM
Accuracy =
Image: Dao-Qiang Zhang, Song-Can
98.41%
SKFCM
Accuracy =
99.88%
KFCM
result
SFCM
result
SKFCM
result
Back
Graph Theory-Based
Use graph theory to solve clustering
problem
Graph terminology
Adjacency matrix
Degree
Volume
Cuts
Algorithm
Given an image, set up a weighted
G (V , Egraph
)
Example
(a) A noisy step image
(b) eigenvector of the second smallest
eigenvalue
(c) resulting partition
Example
(a) Point set generated by two Poisson
processes
(b) Partition of the point set
Example
(a) Three image patches form a junction
(b)-(d) Top three components of the
partition
Example
Components of the partition with Ncut
Example
Back