You are on page 1of 19

Introduction to Data Anlytics

(ITEC 3040)

Clustering
(Modified from Aijun Ans slides)

Outline

1
What Is Clustering?

Examples of Clustering Applications

2
What Is Good Clustering?

Data Representation

3
Similarity (or Dissimilarity) Measures

Similarity and Dissimilarity Between


Objects (Contd)

4
Similarity (or Dissimilarity) Measures

Outline

10

5
Major Clustering Approaches

11

Partitioning Algorithms: Basic Concept

12

6
K-means

13

The K-Means Clustering Method

14

7
K-means example

15

K-means example

16

8
K-means example

17

K-means example

18

9
K-means example

19

K-means example

20

10
K-means example #2

21

K-means example #2

22

11
K-means example #2

23

Comments on the K-Means Method

24

12
A Limitation of K-means: Differing Sizes

25

A Limitation of K-means:
Non-globular (Non- Convex) Shapes

26

13
A Problem of k-Means Method

27

PAM: A K-medoids Method

28

14
Typical k-medoids algorithm (PAM)

29

How to Choose the new Medoid


for a Cluster

30

15
Pros and Cons of PAM

31

Outline

32

16
Hierarchical Clustering

33

Hierarchical Clustering

34

17
A Dendrogram Shows How the
Clusters are Merged Hierarchically

35

Inter-cluster Distances in
Hierarchical Clustering

36

18
Strengths and Limitations of
Hierarchical Methods

37

19

You might also like