Professional Documents
Culture Documents
Clustervalidation
PasiFrnti
15.4.2014
Introduction
Clustervalidation
Precision = 5/5 = 100%
Recall = 5/7 = 71%
Supervisedclassification:
Classlabelsknownforgroundtruth
Oranges:
Accuracy,precision,recall
Clusteranalysis Apples:
P
Noclasslabels
Validationneedto: Precision = 3/5 = 60%
Recall = 3/3 = 100%
Compareclusteringalgorithms
Solvenumberofclusters
Avoidfindingpatternsinnoise
Measuringclusteringvalidity
InternalIndex:
Validatewithoutexternalinfo
Withdifferentnumberofclusters ? ?
Solvethenumberofclusters
ExternalIndex
Validateagainstgroundtruth
?
Comparetwoclusters:
(howsimilar)
?
Clusteringofrandomdata
1
Random Points 1
DBSCAN
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
y
y
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
1
K-means 1
Complete Link
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
y
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
Clustervalidationprocess
1. Distinguishingwhethernonrandomstructureactually
existsinthedata(onecluster).
2. Comparingtheresultsofaclusteranalysistoexternally
knownresults,e.g.,toexternallygivenclasslabels.
3. Evaluatinghowwelltheresultsofaclusteranalysisfit
thedatawithoutreferencetoexternalinformation.
4. Comparingtheresultsoftwodifferentsetsofcluster
analysestodeterminewhichisbetter.
5. Determiningthenumberofclusters.
Clustervalidationprocess
Clustervalidationreferstoproceduresthatevaluatetheresultsof
Clustervalidation
clusteringinaquantitativeand
quantitative objectivefashion.[Jain
objective & Dubes,
1988]
Howtobequantitative:Toemploythemeasures.
Howtobeobjective:Tovalidatethemeasures!
Internalindexes
Internalindexes
Groundtruthisrarelyavailablebutunsupervised
validationmustbedone.
Minimizes(ormaximizes)internalindex:
Variancesofwithinclusterandbetweenclusters
Ratedistortionmethod
Fratio
DaviesBouldinindex(DBI)
BayesianInformationCriterion(BIC)
SilhouetteCoefficient
Minimumdescriptionprinciple(MDL)
Stochasticcomplexity(SC)
Meansquareerror(MSE)
The more clusters the smaller the MSE.
Small knee-point near the correct value.
But how to detect?
Knee-point between
14 and 15 clusters.
Meansquareerror(MSE)
6
-2
-4
-6
5 10 15
10
6
5 clusters
SSE
3
10 clusters
2
0
2 5 10 15 20 25 30
K
FromMSEtoclustervalidity
Minimizewithinclustervariance(MSE)
Maximizebetweenclustervariance
Inter-cluster
Intra-cluster variance is
variance is maximized
minimized
JumppointofMSE
(ratedistortionapproach)
0,12
0,1
0,08
0,06
0,04
0,02
0
0 10 20 30 40 50 60 70 80 90 100
Number of clusters
Sumofsquaresbasedindexes
SSW/kBallandHall(1965)
k2|W|Marriot(1971)
/ k 1
SSB
SSW
Calinski&Harabasz(1974)
/ N k
log(SSB/SSW)Hartigan(1975)
d log( SSW /(dN )) log(k )
2
Xu(1997)
(disthedimensionofdata;Nisthesizeofdata;kisthenumberofclusters)
k
Betweenclusters: SSB(C , k ) n j || c j x ||2
j 1
TotalVarianceofdataset:
N k
( X ) || xi c p (i ) || n j || c j x ||
2 2
i 1 j 1
SSW SSB
Fratiovariancetest
VarianceratioFtest
Measuresratioofbetweengroupsvarianceagainst
thewithingroupsvariance(originalftest)
Fratio(WBindex):
N
k || xi c p (i ) ||2
k SSW
F i 1
k
( X ) SSW
n
j 1
j || c j x ||
2
SSB
CalculationofFratio
7 6
6
5
5
Intermediate result
Cost F Test
4
Divider (between cluster) 3
3
Nominator (k *MSE) 2
2
F-ratio total 1
1
0 0
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Number of cluster
FratiofordatasetS1
1.4
1.2
F-ratio (x10^5)
1.0
PNN
0.8
IS
0.6
minimum
0.4
0.2
0.0
25 23 21 19 17 15 13 11 9 7 5
Clusters
FratiofordatasetS2
1.4
1.2
PNN
1.0
F-ratio (x10^5)
IS
0.8
minimum
0.6
0.4
0.2
0.0
25 23 21 19 17 15 13 11 9 7 5
Clusters
FratiofordatasetS3
1.4
1.3 S3
1.2
1.1
F-ratio
1.0
minimum
0.9
PNN
0.8
IS
0.7
0.6
25 20 15 10 5
Number of clusters
FratiofordatasetS4
1.5
S4
1.4
1.3
PNN
1.2
F-ratio
IS
1.1
minimum at 16
1.0
0.9
minimum at 15
0.8
25 20 15 10 5
Number of clusters
ExtensionoftheFratioforS3
3.1
S3
2.6
2.1
F-ratio
25 20 15 10 5
Number of clusters
Sumofsquarebasedindex
6
15
4
10
2
5
0 0
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Number of cluster
Silhouettecoefficient
[Kaufman&Rousseeuw,1990]
cohesion
separation
Silhouettecoefficient
Cohesion a(x): average distance of x to all other vectors in
the same cluster.
Separation b(x): average distance of x to the vectors in
other clusters. Find the minimum among the clusters.
silhouette s(x):
b( x ) a ( x )
s( x)
max{a ( x), b( x)}
s(x) = [-1, +1]: -1=bad, 0=indifferent, 1=good
Silhouette coefficient (SC):
N
1
SC
N
s( x)
i 1
Silhouettecoefficient
x
x
cohesion
FormulaofBICinpartitioningbasedclustering
m
ni * d ni ni m 1
BIC (ni log ni ni log n log(2 ) log i ) m log n
i 1 2 2 2 2
ddimensionofthedataset
nisizeoftheithcluster
icovarianceofithcluster
KneePointDetectiononBIC
Soft partitions
Comparisonoftheindexes
Kmeans
Comparisonoftheindexes
RandomSwap
PartIII:
Stochasticcomplexityfor
binarydata
Stochasticcomplexity
Principleofminimumdescriptionlength(MDL):
findclusteringCthatcanbeusedfordescribingthe
datawithminimuminformation.
Data=Clustering+descriptionofdata.
Clusteringdefinedbythecentroids.
Datadefinedby:
whichcluster(partitionindex)
whereincluster(differencefromcentroid)
Solutionforbinarydata
M d nij M nj d M
SC n j h n j log
log max(1, n j )
i 1 n j
j 1
j 1 N 2 j 1
where
h p p log p 1 p log 1 p
21.7
Repeated
21.6 K-means
SC
21.5
21.4
RLS
21.3
21.2
50 60 70 80 90
Number of clusters
PartIV:
Externalindexes
Paircountingmeasures
Measurethenumberofpairsthatarein:
SameclassbothinPandG.
K K'
1 G P
a
2 i 1 j 1
nij (nij 1)
SameclassinPbutdifferentinG. a a
1 K' 2 K K' 2 b
b ( n. j nij ) b d
d
2 j 1 i 1 j 1
c c
DifferentclassesinPbutsameinG.
K K K'
1
c ( ni2. nij2 )
2 i 1 i 1 j 1
DifferentclassesbothinPandG.
K K' K K'
1
d ( N 2 nij2 ( ni2. n.2j ))
2 i 1 j 1 i 1 j 1
RandandAdjustedRandindex
[Rand,1971][Hubert and Arabie, 1985]
G P
Agreement: a, d
a a Disagreement: b, c
b d b
d
c c
ad
RI ( P, G )
abcd
RI E ( RI )
ARI
1 E ( RI )
Externalindexes
Iftrueclasslabels(groundtruth)areknown,thevalidity
ofaclusteringcanbeverifiedbycomparingtheclass
labelsandclusteringlabels.
1 K K'
a nij (nij 1)
2 i 1 j 1
1 K' 2 K K' 2
b ( n. j nij )
2 j 1 i 1 j 1
1 K 2 K K' 2
c ( ni. nij )
2 i1 i 1 j 1
1 2 K K' 2 K K'
d ( N nij ( ni. n.2j ))
2
2 i 1 j 1 i 1 j 1
Randindex
(example)
Differentclustersingroundtruth 20 72
Paircounting
Informationtheoretic
Setmatching
Paircountingmeasures
Agreement: a, d
Disagreement: b, c
1 K K'
G P a nij (nij 1)
2 i 1 j 1
1 K' 2 K K' 2
a a b ( n. j nij )
2 j 1 i 1 j 1
b d b
1 K 2 K K' 2
d c ( ni. nij )
c c 2 i1 i 1 j 1
1 2 K K' 2 K K'
d ( N nij ( ni. n.2j ))
2
2 i 1 j 1 i 1 j 1
ad
Rand Index:RI ( P, G ) abcd
RI E ( RI )
Adjusted Rand Index: ARI
1 E ( RI ) 51
Informationtheoreticmeasures
Basedontheconceptofentropy
K K'
nij Nnij
MI ( P, G ) log
i 1 j 1 N ni n j
MutualInformation(MI)measurestheinformationthattwoclusterings
shareandVariationofInformation(VI)isthecomplementofMI
H (P ) H (G )
ni : size of cluster Pi
H ( P | G)
MI H (G | P) n j : size of cluster G j
nij : number of shared
objects in Pi and G j
VI ( P, G )
Setmatchingmeasures
Categories
Pointlevel
Clusterlevel
Threeproblems
Howtomeasurethesimilarityoftwoclusters?
Howtopairclusters?
Howtocalculateoverallsimilarity?
Similarityoftwoclusters
| Pi G j | P1
Jaccard J
| Pi G j | n1=1000
2 | Pi G j | P3 P2
Sorensen-Dice SD
| Pi | | G j | n3=200 n2=250
| Pi G j |
Braun-Banquet BB
max(| Pi |, | G j |)
P2, P3 P2, P1
Criterion H/NVD/CSI 200 250
J 0.80 0.25
SD 0.89 0.40
BB 0.80 0.25
Pairing
Matchingprobleminweightedbipartitegraph
G P
P1
G2
G1
P2
G3
P3
Pairing
MatchingorPairing?
Algorithms
Greedy
Optimalpairing
NormalizedVanDongen
Matchingbasedonnumberofsharedobjects
K K'
( 2 N max K'
j 1 nij max iK1 nij )
i 1 j 1
NVD
2N
PairSetIndex(PSI)
nij
Sij
max(| Pi |, | G j |)
Similarityoftwoclusters
Gj Pi
Sij 1
j:theindexofpairedclusterwithPi S=100%
S PG Sij
S ji 1
i
S ij 0.5
TotalSImilarity S=50%
S ji 0.5
OptimalpairingusingHungarianalgorithm
PairSetIndex(PSI)
Adjustmentforchance
min( K , K ')
ni (mi / N )
E (S ) 1 max(ni , mi )
size of clusters in P : n1>n2>>nK
size of clusters in G : m1>m2>>mK
Max( S ) 1
Transforma tion :
E (S ) 0
S E (S )
S E (S )
PSI max( K , K ') E ( S )
0 S E (S )
PropertiesofPSI
Symmetric
Normalizedtonumberofclusters
Normalizedtosizeofclusters
Adjusted
Rangein[0,1]
Numberofclusterscanbedifferent
Randompartitioning
ChangingnumberofclustersinPfrom1to20
P2 2000 3000
P3 2500 3000
P4 3000
Wronglabelingsomepartofeachcluster
G 1000 2000 3000
G1 1000 2000
P1 800 1800
Clusterlevelmeasure
Comparingpartitionsofcentroids
GiventwosetsofcentroidsCandC,
findnearestneighbormappings(CC):
qi arg min ci c' j , i 1, K1
2
1 j K 2
Detectprototypeswithnomapping:
1, qi j i
orphan c
'
j
0, otherwise
1
1 2
Counts 1
2
1 Mappings
1
1
0
1 1 1
CI = 2 1
1
Index-value equals to the 0 Value 1 indicate
count of zero-mappings same cluster
ExampleoftheCentroidindex
1
0
Two clusters
but only one 3
allocated
1
Three mapped
into one
AdjustedRandvs.Centroidindex
Merge-based (PNN)
ARI=0.91 ARI=0.82
CI=0 CI=1
Random
Swap K-means
ARI=0.88
CI=1
Centroidindexproperties
Mappingisnotsymmetric(CCCC)
Symmetriccentroidindex:
CI 2 C , C ' max CI 1 C , C ' , CI 1 C ' , C
Pointwisevariant(CentroidSimilarityIndex):
MatchingclustersbasedonCI
Similarityofclusters K K2
C
1
S12 S 21
C i Cj j Ci
CSI where S12 i 1
S 21 j 1
2 N N
Centroidindex
3 0 2 0 4
0.87 0.87
1
0.65
MeanSquaredErrors
Clustering quality (MSE)
Data set
KM RKM KM++ XM AC RS GKM GA
Bridge 179.76 176.92 173.64 179.73 168.92 164.64 164.78 161.47
House 6.67 6.43 6.28 6.20 6.27 5.96 5.91 5.87
Miss America 5.95 5.83 5.52 5.92 5.36 5.28 5.21 5.10
House 3.61 3.28 2.50 3.57 2.62 2.83 - 2.44
Birch1 5.47 5.01 4.88 5.12 4.73 4.64 - 4.64
Birch2 7.47 5.65 3.07 6.29 2.28 2.28 - 2.28
Birch3 2.51 2.07 1.92 2.07 1.96 1.86 - 1.86
S1 19.71 8.92 8.92 8.92 8.93 8.92 8.92 8.92
S2 20.58 13.28 13.28 15.87 13.44 13.28 13.28 13.28
S3 19.57 16.89 16.89 16.89 17.70 16.89 16.89 16.89
S4 17.73 15.70 15.70 15.71 17.52 15.70 15.71 15.70
AdjustedRandIndex
Adjusted Rand Index (ARI)
Data set
KM RKM KM++ XM AC RS GKM GA
Bridge 0.38 0.40 0.39 0.37 0.43 0.52 0.50 1
House 0.40 0.40 0.44 0.47 0.43 0.53 0.53 1
Miss America 0.19 0.19 0.18 0.20 0.20 0.20 0.23 1
House 0.46 0.49 0.52 0.46 0.49 0.49 - 1
Birch 1 0.85 0.93 0.98 0.91 0.96 1.00 - 1
Birch 2 0.81 0.86 0.95 0.86 1 1 - 1
Birch 3 0.74 0.82 0.87 0.82 0.86 0.91 - 1
S1 0.83 1.00 1.00 1.00 1.00 1.00 1.00 1.00
S2 0.80 0.99 0.99 0.89 0.98 0.99 0.99 0.99
S3 0.86 0.96 0.96 0.96 0.92 0.96 0.96 0.96
S4 0.82 0.93 0.93 0.94 0.77 0.93 0.93 0.93
NormalizedMutualinformation
Normalized Mutual Information (NMI)
Data set
KM RKM KM++ XM AC RS GKM GA
Bridge 0.77 0.78 0.78 0.77 0.80 0.83 0.82 1.00
House 0.80 0.80 0.81 0.82 0.81 0.83 0.84 1.00
Miss America 0.64 0.64 0.63 0.64 0.64 0.66 0.66 1.00
House
0.81 0.81 0.82 0.81 0.81 0.82 - 1.00
Birch 1 0.95 0.97 0.99 0.96 0.98 1.00 - 1.00
Birch 2 0.96 0.97 0.99 0.97 1.00 1.00 - 1.00
Birch 3 0.90 0.94 0.94 0.93 0.93 0.96 - 1.00
S1 0.93 1.00 1.00 1.00 1.00 1.00 1.00 1.00
S2 0.90 0.99 0.99 0.95 0.99 0.93 0.99 0.99
S3 0.92 0.97 0.97 0.97 0.94 0.97 0.97 0.97
S4 0.88 0.94 0.94 0.95 0.85 0.94 0.94 0.94
NormalizedVanDongen
Normalized Van Dongen (NVD)
Data set
KM RKM KM++ XM AC RS GKM GA
Bridge 0.45 0.42 0.43 0.46 0.38 0.32 0.33 0.00
House 0.44 0.43 0.40 0.37 0.40 0.33 0.31 0.00
Miss America 0.60 0.60 0.61 0.59 0.57 0.55 0.53 0.00
House
0.40 0.37 0.34 0.39 0.39 0.34 - 0.00
Birch 1 0.09 0.04 0.01 0.06 0.02 0.00 - 0.00
Birch 2 0.12 0.08 0.03 0.09 0.00 0.00 - 0.00
Birch 3 0.19 0.12 0.10 0.13 0.13 0.06 - 0.00
S1 0.09 0.00 0.00 0.00 0.00 0.00 0.00 0.00
S2 0.11 0.00 0.00 0.06 0.01 0.04 0.00 0.00
S3 0.08 0.02 0.02 0.02 0.05 0.00 0.00 0.02
S4 0.11 0.04 0.04 0.03 0.13 0.04 0.04 0.04
CentroidIndex
C-Index (CI2)
Data set
KM RKM KM++ XM AC RS GKM GA
Bridge 74 63 58 81 33 33 35 0
House 56 45 40 37 31 22 20 0
Miss America 88 91 67 88 38 43 36 0
House 0
43 39 22 47 26 23 ---
Birch 1 7 3 1 4 0 0 --- 0
Birch 2 18 11 4 12 0 0 --- 0
Birch 3 23 11 7 10 7 2 --- 0
S1 2 0 0 0 0 0 0 0
S2 2 0 0 1 0 0 0 0
S3 1 0 0 0 0 0 0 0
S4 1 0 0 0 1 0 0 0
CentroidSimilarityIndex
Centroid Similarity Index (CSI)
Data set
KM RKM KM++ XM AC RS GKM GA
Method MSE
GKM Global K-means 164.78
RS Random swap (5k) 164.64
GA Genetic algorithm 161.47
RS8M Random swap (8M) 161.02
GAIS-2002 GAIS 160.72
+ RS1M GAIS + RS (1M) 160.49
+ RS8M GAIS + RS (8M) 160.43
GAIS-2012 GAIS 160.68
+ RS1M GAIS + RS (1M) 160.45
+ RS8M GAIS + RS (8M) 160.39
+ PRS GAIS + PRS 160.33
+ RS8M + GAIS + RS (8M) + 160.28
Centroidindexvalues
RS8M GAIS 2002 GAIS 2012
Main
algorithm:
RS1M RS8M RS1M RS8M RS8M
+ Tuning 1
+ Tuning 2
RS8M --- 19 19 19 23 24 24 23 22
GAIS (2002) 23 --- 0 0 14 15 15 14 16
+ RS1M 23 0 --- 0 14 15 15 14 13
+ RS8M 23 0 0 --- 14 15 15 14 13
GAIS (2012) 25 17 18 18 --- 1 1 1 1
+ RS1M 25 17 18 18 1 --- 0 0 1
+ RS8M 25 17 18 18 1 0 --- 0 1
+ PRS 25 17 18 18 1 0 0 --- 1
+ RS8M + PRS 24 17 18 18 1 1 1 1 ---
Summaryofexternalindexes
(existingmeasures)
PartVI:
Efficientimplementation
Strategiesforefficientsearch
Bruteforce:solveclusteringforallpossible
numberofclusters.
Stepwise:asinbruteforcebutstartusing
previoussolutionanditerateless.
Criterionguidedsearch:Integratecostfunction
directlyintotheoptimizationfunction.
Bruteforcesearchstrategy
Search for each separately
100 %
Number of clusters
Stepwisesearchstrategy
Start from the previous result
30-40 %
Number of clusters
Criterionguidedsearch
Integrate with the cost function!
3-6 %
Number of clusters
Stoppingcriterionfor
stepwisesearchstrategy
S t a r t in g p o in t
f1
E v a lu a tio n f u n c tio n v a lu e
f 3k / 2
k Tmin L
f1 f k
H a lf w a y
f k/2
C u rren t
fk
E s t im a t e d
f 3 k/2
1 k /2 k 3 k /2
I t e r a t io n n u m b e r
Comparisonofsearchstrategies
100
90
80
70 DLS
60 CA
% 50 Stepwise/FCM
40 Stepwise/LBG-U
30 Stepwise/K-means
20
10
0
2 3 4 5 6 7 8 9 10 11 12 13 14 15
Data dimensionality
Openquestions
Iterativealgorithm(KmeansorRandomSwap)
withcriterionguidedsearch
or
Hierarchicalalgorithm???
Po
M t en
Sc t
or ial t
P h op
D ic f
th
esi or
s!
Literature
1. G.W.Milligan,andM.C.Cooper,Anexaminationofproceduresfor
determiningthenumberofclustersinadataset,Psychometrika,Vol.50,1985,
pp.159179.
2. E.Dimitriadou,S.Dolnicar,andA.Weingassel,Anexaminationofindexesfor
determiningthenumberofclustersinbinarydatasets,Psychometrika,Vol.67,
No.1,2002,pp.137160.
3. D.L.DaviesandD.W.Bouldin,"Aclusterseparationmeasure,IEEE
TransactionsonPatternAnalysisandMachineIntelligence,1(2),224227,
1979.
4. J.C.BezdekandN.R.Pal,"Somenewindexesofclustervalidity,IEEE
TransactionsonSystems,ManandCybernetics,28(3),302315,1998.
5. H.Bischof,A.Leonardis,andA.Selb,"MDLPrincipleforrobustvector
quantization,PatternAnalysisandApplications,2(1),5972,1999.
6. P.Frnti,M.XuandI.Krkkinen,"Classificationofbinaryvectorsbyusing
DeltaSCdistancetominimizestochasticcomplexity",PatternRecognition
Letters,24(13),6573,January2003.
Literature
7. G.M.James,C.A.Sugar,"FindingtheNumberofClustersinaDataset:An
InformationTheoreticApproach".JournaloftheAmericanStatistical
Association,vol.98,397408,2003.
8. P.K.Ito,RobustnessofANOVAandMANOVATestProcedures.In:
KrishnaiahP.R.(ed),HandbookofStatistics1:AnalysisofVariance.North
HollandPublishingCompany,1980.
9. I.KrkkinenandP.Frnti,"Dynamiclocalsearchforclusteringwithunknown
numberofclusters",Int.Conf.onPatternRecognition(ICPR02),Qubec,
Canada,vol.2,240243,August2002.
10. D.PellagandA.Moore,"Xmeans:ExtendingKMeanswithEfficient
EstimationoftheNumberofClusters",Int.Conf.onMachineLearning(ICML),
727734,SanFrancisco,2000.
11. S.SalvadorandP.Chan,"DeterminingtheNumberofClusters/Segmentsin
HierarchicalClustering/SegmentationAlgorithms",IEEEInt.Con.Toolswith
ArtificialIntelligence(ICTAI),576584,BocaRaton,Florida,November,2004.
12. M.Gyllenberg,T.KoskiandM.Verlaan,"Classificationofbinaryvectorsby
stochasticcomplexity".JournalofMultivariateAnalysis,63(1),4772,1997.
Literature
13. M.Gyllenberg,T.KoskiandM.Verlaan,"Classificationofbinaryvectorsby
stochasticcomplexity".JournalofMultivariateAnalysis,63(1),4772,1997.
14. X.HuandL.Xu,"AComparativeStudyofSeveralClusterNumberSelection
Criteria",Int.Conf.IntelligentDataEngineeringandAutomatedLearning
(IDEAL),195202,HongKong,2003.
15. Kaufman,L.andP.Rousseeuw,1990.FindingGroupsinData:AnIntroduction
toClusterAnalysis.JohnWileyandSons,London.ISBN:10:0471878766.
16. [1.3]M.Halkidi,Y.BatistakisandM.Vazirgiannis:Clustervaliditymethods:part
1,SIGMODRec.,Vol.31,No.2,pp.4045,2002
17. R.Tibshirani,G.Walther,T.Hastie.Estimatingthenumberofclustersinadata
setviathegapstatistic.J.R.Statist.Soc.B(2001)63,Part2,pp.411423.
18. T.Lange,V.Roth,M,BraunandJ.M.Buhmann.Stabilitybasedvalidationof
clusteringsolutions.NeuralComputation.Vol.16,pp.12991323.2004.
Literature
19. Q.Zhao,M.XuandP.Frnti,"Sumofsquaresbasedclusteringvalidity
indexandsignificanceanalysis",Int.Conf.onAdaptiveandNatural
ComputingAlgorithms(ICANNGA09),Kuopio,Finland,LNCS5495,313
322,April2009.
20. Q.Zhao,M.XuandP.Frnti,"Kneepointdetectiononbayesian
informationcriterion",IEEEInt.Conf.ToolswithArtificialIntelligence
(ICTAI),Dayton,Ohio,USA,431438,November2008.
21. W.M.Rand,Objectivecriteriafortheevaluationofclusteringmethods,
JournaloftheAmericanStatisticalAssociation,66,846850,1971
22. L.HubertandP.Arabie,Comparingpartitions,JournalofClassification,
2(1),193218,1985.
23. P.Frnti,M.RezaeiandQ.Zhao,"Centroidindex:Clusterlevelsimilarity
measure",PatternRecognition,2014.(accepted)