Professional Documents
Culture Documents
KMeansClustering
Asanexampleofkmeansclustering,asamplePASW17.0datasetwasused;telco_extra.sav, telecommunicationsproviderdatathathas14continuousvariables.Thecontinuousvariableshave alreadybeenstandardized,withameanof0andstandarddeviationof1,toallowfordifferentunitsin whichvariablesweremeasured.Thisanalysiswillclustercustomersbytheirserviceusagepatterns. InPASW17.0,gotoAnalyze>Classify>KMeansCluster
Page2of7
ClickOptionsintheKMeansClusterAnalysisdialogbox.CheckInitialclustercenters,ANOVAtable, Clusterinformationforeachcase,andExcludecasespairwise.ClickContinue.ClickOk. Initialclustercenters.Printstheinitialvariablemeansforeachclusterintheoutput. ANOVAtable.ANOVAFtestsareconductedforeachvariabletoindicatehowwellthevariable discriminatesbetweenclusters. Clusterinformationforeachcase.Printseachcase'sfinalclusterassignmentandtheEuclidean distancebetweenthecaseandtheclustercenterintheouput. MissingValues.Thedefaultislistwisedeletion.Forthisexample,therearemanymissingvalues becausemostcustomersdidnotsubscribetoallservices,soexcludingcasespairwisemaximizes theinformationyoucanobtainfromthedata.
Page3of7
KMeansClusteringInterpretation
TheInitialClusterCenterstableshowsthefirststepinthekmeansclusteringinfindingthekcenters.
TheIterationHistorytableshowsthenumberofiterationsthatwereenoughuntilclustercentersdid notchangesubstantially.
Page4of7
Page5of7
TheDifferencesbetweenFinalClusterCenterstableshowstheEuclideandistancesbetweenthefinal clustercenters.Greaterdistancesbetweenclustersmeantherearegreaterdissimilarities.
Clusters1and3havethegreatestdissimilarities.
Cluster2isequallysimilartoClusters1and3.
Page6of7
TheNumberofCasesineachClustertableillustratesthesplitofcasesintoclusters.Alargenumberof caseswereassignedtothethirdcluster,whichistheleastprofitablegroup.
Page7of7