You are on page 1of 2

7.1.

5 Divisive Analysis (DIANA)


DIANA is a hierarchical clustering technique, but its main
difference with the agglomerative method (AGNES) is that it
constructs the hierarchy in the inverse order.
Initially (Step 0), there is one large cluster consisting of
all n objects. At each subsequent step, the largest available
cluster is split into two clusters until finally all clusters, comprise
of single objects. Thus, the hierarchy is built in n-1 steps.
In the first step of an agglomerative method, all possible fusions
of two objects are considered leading to
combinations. In
the divisive method based on the same principle, there are
possibilities to split the data into two clusters. This number is
considerably larger than that in the case of an agglomerative
method.
To avoid considering all possibilities, the algorithm proceeds as
follows.
1. Find the object, which has the highest average dissimilarity to all
other objects. This object initiates a new cluster a sort of
a splinter group.
2. For each object i outside the splinter group compute
3. Di = [average d(i,j) j Rsplinter group ] - [average d(i,j) j Rsplinter group]
4. Find an object h for which the difference Dh is the largest. If Dh is
positive, then h is, on the average close to the splinter group.
5. Repeat Steps 2 and 3 until all differences Dh are negative. The data
set is then split into two clusters.
6. Select the cluster with the largest diameter. The diameter of a
cluster is the largest dissimilarity between any two of its objects.
Then divide this cluster, following steps 1-4.
7. Repeat Step 5 until all clusters contain only a single object.

Divisive Coefficient (DC)


For each object i, let d ( i ) denote the diameter of the last cluster
to which it belongs (before being split off as a single object),
divided by the diameter of the whole data set.
The divisive coefficient (DC), given by

indicates the strength of the clustering structure found by the


algorithm.
Graphical display
The hierarchy obtained from Diana can be represented
graphically by the dissimilarity banner.
Dissimilarity banner consists of lines with stars and stripes,
which repeat the identifiers of the objects. The banner is read
from left to right, but the fixed scales above and below the
banner range from 1.00 (corresponding to the diameter of the
entire data set) to 0.00 (corresponding to the diameter of the
singletons)
Each line with stars ends at the diameter at which the cluster is
split. The actual diameter of the data set (corresponding to 1.00
in the banner) is displayed just below the banner.
The divisive coefficient defined above can be seen as the average
width of the divisive banner.

where li is the length of the line containing the identifier of


object i. Thus it can be seen as the counterpart of the
agglomerative coefficient (AC):
DC = 0

No cluster structure

DC = 1

Clear cluster structure

http://www.unesco.org/webworld/idams/advguide/Chapt7_1_5.htm

anggal 27 april 2016

You might also like