Clustering Techniques and Applications

Clustering Techniques and
Applications to Image Segmentation

Liang Shan
shan@cs.unc.edu
Roadmap
Unsupervised learning
Clustering categories
Clustering algorithms
K-means
Fuzzy c-means
Kernel-based
Graph-based
Q&A
Unsupervised learning
Definition 1
Supervised: human effort involved
Unsupervised: no human effort
Definition 2
Supervised: learning conditional distribution
P(Y|X), X: features, Y: classes

Unsupervised: learning distribution P(X), X:
features
Slide credit: Min
Back
Clustering
What is clustering?
Clustering
Definition
Assignment of a set of observations into
subsets so that observations in the same

subset are similar in some sense
Clustering
Hard vs. Soft
Hard: same object can only belong to single
cluster
Soft: same object can belong to different
clusters
Slide credit: Min
Clustering
Hard vs. Soft
Hard: same object can only belong to single
cluster
Soft: same object can belong to different
clusters
E.g. Gaussian mixture model
Slide credit: Min
Clustering
Flat vs. Hierarchical
Flat: clusters are flat
Hierarchical: clusters form a tree
Agglomerative
Divisive
Hierarchical clustering
Agglomerative (Bottom-up)
Compute all pair-wise pattern-pattern
similarity coefficients
Place each of n patterns into a class of its
own
Merge the two most similar clusters into one
Replace the two clusters into the new cluster
Re-compute inter-cluster similarity scores w.r.t. the
new cluster
Repeat the above step until there are k
clusters left (k can be 1)
Slide credit: Min
Agglomerative (Bottom up)
1st iteration
1
2nd iteration
1
3rd iteration
3
4th iteration
3
5th iteration
3
Finally k clusters left
6
2
8
7
9
5
Divisive (Top-down)
Start at the top with all patterns in one
cluster
The cluster is split using a flat clustering
algorithm
This procedure is applied recursively until
each pattern is in its own singleton cluster
Divisive (Top-down)
Slide credit: Min
Bottom-up vs. Top-down

Which one is more complex?
Which one is more efficient?
Which one is more accurate?

Top-down
Because a flat clustering is needed as a
subroutine


Top-down
For a fixed number of top levels, using an
efficient flat algorithm like K-means, divisive

algorithms are linear in the number of
patterns and clusters
Agglomerative algorithms are least quadratic


Top-down
Bottom-up methods make clustering
decisions based on local patterns without

initially taking into account the global
distribution. These early decisions cannot be
undone.
Top-down clustering benefits from complete
information about the global distribution
when making top-level partitioning decisions.
Back
X x1 , x2 ,L , xn
Data set:
C1 , C2 ,L Ck
Clusters:
V
Codebook
: v1 , v2 ,L , vk
ij
Partition matrix:
K-means
Minimizes functional:
k
E , V ij x j vi
1
0
ij
2
if x j Ci
otherwise
i 1 j 1
Iterative algorithm:
Initialize the codebook V with vectors
randomly picked from X

Assign each pattern to the nearest cluster
Recalculate partition matrix
Repeat the above two steps until
convergence
K-means
Disadvantages
Dependent on initialization
K-means
Disadvantages
K-means
Disadvantages
K-means
Disadvantages
Select random seeds with at least Dmin
Or, run the algorithm many times
K-means
Disadvantages
Sensitive to outliers
K-means
Disadvantages
Sensitive to outliers
Use K-medoids
K-means
Disadvantages
Sensitive to outliers (K-medoids)
Can deal only with clusters with spherical
symmetrical point distribution

Kernel trick
K-means
Disadvantages
Sensitive to outliers (K-medoids)
Can deal only with clusters with spherical
symmetrical point distribution

Deciding K
Deciding K
Try a couple of K
Image: Henry Lin
Deciding K
When k = 1, the objective function is 873.0
Image: Henry Lin
Deciding K
Image: Henry Lin
Deciding K
Image: Henry Lin
Deciding K
We can plot objective function values for
k=1 to 6
The abrupt change at k=2 is highly
suggestive of two clusters
knee finding or elbow finding
Note that the results are not always as
clear cut as in this toy example
Image: Henry Lin
Back
Fuzzy C-means
Soft clustering
Minimize functional
k
X x1 , x2 ,L , xn
Data set:
C1 , C2 ,L Ck
Clusters:
V
Codebook
: v1 , v2 ,L , vk 1
ij ij
Partition matrix:
K-means:E , V
E U , V uij
i 1 j 1
U uij
u
set to 2
x j vi

i 1 j 1
ij
x j vi
uij
0,1
fuzzy partition
matrix
k n
m 1,
if x j Ci
otherwise
i 1
ij
j 1,K , n
fuzzification parameter, usually
Fuzzy C-means
Minimize
E U ,V uij
subject to
i 1 j 1
u
i 1
ij
x j vi
j 1,K , n
Fuzzy C-means
Minimize
E U , V uij
subject to
i 1 j 1
u
i 1
ij
x j vi
j 1,K , n
How to solve this constrained optimization
problem?
Fuzzy C-means
Minimize
E U ,V uij
subject to
i 1 j 1
u
i 1
ij
x j vi
j 1,K , n
How to solve this constrained optimization
problem?
Introduce Lagrangian
k
n
multipliers
m
2
k
L j U ,V =
u
i 1 j 1
ij
x j vi
j uij 1
i 1
Fuzzy c-means
Introduce Lagrangian multipliers
L j U ,V =
u
i 1 j 1
ij
x j vi
j uij 1
i 1
Iterative optimization
Fix V, optimize w.r.t. Uuij
x j vi
v
l 1
j
l
Fix U, optimize w.r.t. V v

i
uij x j
m
j 1
n
u
j 1
ij
m 1
Application to image
segmentation
Original images
Segmentation
s
Homogenous
intensity
corrupted by 5%
Gaussian noise
Accuracy =
96.02%
Sinusoidal
inhomogenous
intensity
corrupted by 5%
Gaussian noise
Image: Dao-Qiang Zhang, Song-Can
Accuracy =
94.41%
Back
Kernel substitution trick

x j vi
x j x j x j vi vi x j vi vi
T
K x j , x j 2 K x j , vi K vi , vi
Kernel K-means
k
E , V ij x j vi
i 1 j 1
Kernel fuzzy c-means

k
E U , V uij
i 1 j 1
x j vi
Kernel substitution trick

Kernel fuzzy c-means
k
E U , V uij
x j vi
i 1 j 1
Confine ourselves to Gaussian RBF kernel

k
E U , V 2 uij
i 1 j 1
1 K x ,v
j
Introduce a penalty term containing

k
n
neighborhood
information
m
E U , V uij
i 1 j 1
1 K x j , vi
Equation: Dao-Qiang Zhang, Song-
Nj
u 1 u
i 1 j 1
ij
xr N j
ir
Spatially constrained KFCM

k
E U , V uij
i 1 j 1
1 K x j , vi
Nj
uij
i 1 j 1
xr N j
1 uir
N j : the set of neighbors that exist in a x j
window
around N j
Nj
: the cardinality of
controls the effect of the penalty term
The penalty term is minimized when
Membership value for xj is large and also
0.9pixels
0.9 0.9
large at neighboring
0.9 0.9 0.9
Vice versa
0.9 0.9 0.9
Equation: Dao-Qiang Zhang, Song-
0.1 0.1 0.1

0.1 0.9 0.1
0.1 0.1 0.1
FCM applied to segmentation

Original images
FCM
Accuracy =
96.02%
KFCM
Accuracy =
96.51%
Homogenous
intensity corrupted
by 5% Gaussian
noise
SFCM
Accuracy =
99.34%
SKFCM
Accuracy =
100.00%

Original images
FCM
Accuracy =
94.41%
KFCM
Accuracy =
91.11%
Sinusoidal
inhomogenous
intensity corrupted
by 5% Gaussian
noise
SFCM
Accuracy =
98.41%
SKFCM
Accuracy =
99.88%

FCM
result
KFCM
result
Original MR image corrupted

by 5% Gaussian noise
SFCM
result
SKFCM
result
Back
Graph Theory-Based
Use graph theory to solve clustering
problem
Graph terminology
Adjacency matrix
Degree
Volume
Cuts
Slide credit: Jianbo Shi
Problem with min. cuts

Minimum cut criteria favors cutting small
sets of isolated nodes in the graph

Not surprising since the cut increases with
the number of edges going across the two
partitioned parts
Image: Jianbo Shi and Jitendra
Algorithm
Given an image, set up a weighted
G (V , Egraph
)
and set the weight on the edge

connecting two nodes to be a measure of
the similarity
the two nodes
( D W ) x between
Dx
Solve
for the
eigenvectors with the second smallest
eigenvalue
Use the second smallest eigenvector to
bipartition the graph
Decide if the current partition should be
subdivided and recursively repartition the
segmented parts if necessary
Example
(a) A noisy step image
(b) eigenvector of the second smallest
eigenvalue
(c) resulting partition
Example
(a) Point set generated by two Poisson
processes
(b) Partition of the point set
Example
(a) Three image patches form a junction
(b)-(d) Top three components of the
partition
Example
Components of the partition with Ncut
value less than 0.04
Example
Back

Clustering Techniques and Applications

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Clustering Techniques and Applications

Uploaded by

Copyright:

Available Formats

Clustering Techniques and

Applications to Image Segmentation

P(Y|X), X: features, Y: classes

Slide credit: Min

subsets so that observations in the same

Slide credit: Min

Slide credit: Min

Repeat the above step until there are k

clusters left (k can be 1)

Slide credit: Min

Slide credit: Min

Bottom-up vs. Top-down

Bottom-up vs. Top-down

Bottom-up vs. Top-down

Bottom-up vs. Top-down

efficient flat algorithm like K-means, divisive

Bottom-up vs. Top-down

Bottom-up vs. Top-down

decisions based on local patterns without

randomly picked from X

symmetrical point distribution

symmetrical point distribution

Image: Henry Lin

Image: Henry Lin

Image: Henry Lin

Image: Henry Lin

Image: Henry Lin

fuzzification parameter, usually

How to solve this constrained optimization

How to solve this constrained optimization

Fix V, optimize w.r.t. Uuij

Fix U, optimize w.r.t. V v

Kernel substitution trick

Kernel fuzzy c-means

Kernel substitution trick

Confine ourselves to Gaussian RBF kernel

Introduce a penalty term containing

Equation: Dao-Qiang Zhang, Song-

Spatially constrained KFCM

N j : the set of neighbors that exist in a x j

Equation: Dao-Qiang Zhang, Song-

0.1 0.1 0.1

FCM applied to segmentation

FCM applied to segmentation

FCM applied to segmentation

Original MR image corrupted

Image: Dao-Qiang Zhang, Song-Can

Slide credit: Jianbo Shi

Slide credit: Jianbo Shi

Slide credit: Jianbo Shi

Slide credit: Jianbo Shi

Slide credit: Jianbo Shi

Problem with min. cuts

sets of isolated nodes in the graph

Image: Jianbo Shi and Jitendra

Slide credit: Jianbo Shi

Slide credit: Jianbo Shi

and set the weight on the edge

Image: Jianbo Shi and Jitendra

Image: Jianbo Shi and Jitendra

Image: Jianbo Shi and Jitendra

value less than 0.04

Image: Jianbo Shi and Jitendra

Image: Jianbo Shi and Jitendra