You are on page 1of 7

ClusterAnalysis

WhatisClusterAnalysis? Clusteranalysisisastatisticaltechniqueusedtogroupcases(individualsorobjects)intohomogeneous subgroupsbasedonresponsestovariables.UsingPASW(SPSS)17.0toconductaclusteranalysis,there arethreeclusteringprocedures:twostep,kmeans,andhierarchical. Kmeansclusteringallowsyoutoselectthenumberofclustersandtheprocedurecanbeusedwith moderatetolargedatasets.Thekmeansclusteringalgorithmassignscasestoclustersbasedonthe smallestamountofdistancebetweentheclustermeanandcase.Thisisaniterativeprocessthatstops oncetheclustermeansdonotchangemuchinsuccessivesteps.

KMeansClustering
Asanexampleofkmeansclustering,asamplePASW17.0datasetwasused;telco_extra.sav, telecommunicationsproviderdatathathas14continuousvariables.Thecontinuousvariableshave alreadybeenstandardized,withameanof0andstandarddeviationof1,toallowfordifferentunitsin whichvariablesweremeasured.Thisanalysiswillclustercustomersbytheirserviceusagepatterns. InPASW17.0,gotoAnalyze>Classify>KMeansCluster

Next,theKMeansClusterAnalysismenuappears.SelectStandardizedloglongdistancethrough StandardizedlogwirelessandStandardizedmultiplelinesthroughStandardizedelectronicbilling variablesandplaceintheVariablesbox. LabelCasesby.Optional;placevariableheretolabelcases NumberofClusters.Youhavetospecifythenumberofclustersyouwant.Forthisexample, type3inthebox. Method.Thedefault"Iterateandclassify,"whichisaniterativeprocessisusedtocomputethe clustermeanseachtimeacaseisaddedordeletedfromthecluster.Clustersarethenclassified Page1of7

basedonceclustercentershavebeenupdated.The"Classifyonly"methodareclassifiedbased ontheinitialclustercenters,whicharenotiterativelycomputed.Forthisexample,Iterateand classifyischosen. ClusterCenters.Youcandrawinitialclustercentersfromafile(Readinitial)oryoucansave thefinalclustercenters(Writefinal).Forthisexample,wearenotusingeitheroption.

ClicktheIteratebutton;theKMeansClusterAnalysis:Iterateboxappears.ChangeMaximum Iterationsto20.ClickContinue. MaximumIterations.Setsthemaximumnumberofiterations. ConvergenceCriterion:Thedefaultterminatesoncethelargestchangeinmeansofanycluster islessthan2%oftheminimumdistancebetweeninitialclustercenters. Userunningmeans.Ifthisboxischecked,clustercenterswillbeupdatedaftereachcaseis classified,insteadofafterallofthecasesareclassified.

Page2of7

ClickOptionsintheKMeansClusterAnalysisdialogbox.CheckInitialclustercenters,ANOVAtable, Clusterinformationforeachcase,andExcludecasespairwise.ClickContinue.ClickOk. Initialclustercenters.Printstheinitialvariablemeansforeachclusterintheoutput. ANOVAtable.ANOVAFtestsareconductedforeachvariabletoindicatehowwellthevariable discriminatesbetweenclusters. Clusterinformationforeachcase.Printseachcase'sfinalclusterassignmentandtheEuclidean distancebetweenthecaseandtheclustercenterintheouput. MissingValues.Thedefaultislistwisedeletion.Forthisexample,therearemanymissingvalues becausemostcustomersdidnotsubscribetoallservices,soexcludingcasespairwisemaximizes theinformationyoucanobtainfromthedata.

Page3of7

KMeansClusteringInterpretation
TheInitialClusterCenterstableshowsthefirststepinthekmeansclusteringinfindingthekcenters.

TheIterationHistorytableshowsthenumberofiterationsthatwereenoughuntilclustercentersdid notchangesubstantially.

Page4of7

TheClusterMembershiptablegivesyouthecaseclustereachcasebelongstoandtheEuclidean distanceofeachcasetotheclustercenter.Belowisaprintoutofthefirstandlast10cases.Visual inspectionofdistancesisnecessarytocheckforoutliersthatmaynotadequatelyreflectthepopulation.

TheFinalClusterCenterstablebelowallowsyoutodescribetheclustersbythevariables.Forexample, customersinCluster1tendtopurchasealotofservices,asevidencedbyvaluesabovethemeanforall variables.CustomersinCluster2tendtopurchasethe"calling"services,shownbypositivevaluesfor thefourcallingservices(callerID,callwaiting,callforwarding,and3waycalling).Customersin Cluster3tendtospendverylittleanddonotpurchasemanyservices;theyhavenegativevalueson mostofthevariables.

Page5of7

TheDifferencesbetweenFinalClusterCenterstableshowstheEuclideandistancesbetweenthefinal clustercenters.Greaterdistancesbetweenclustersmeantherearegreaterdissimilarities.

Clusters1and3havethegreatestdissimilarities.

Cluster2isequallysimilartoClusters1and3.

TheANOVAtableindicateswhichvariablescontributethemosttoyourclustersolution.Variableswith largemeansquareerrorsprovidetheleasthelpindifferentiatingbetweenclusters.Forexample,long distanceandcallingcardhadthetwohighestmeansquareerrors(andlowestFstatistics);therefore,the twovariableswerenotashelpfulastheothervariablesinforminganddifferentiatingclusters.

Page6of7

TheNumberofCasesineachClustertableillustratesthesplitofcasesintoclusters.Alargenumberof caseswereassignedtothethirdcluster,whichistheleastprofitablegroup.

Page7of7

You might also like