9 views

Uploaded by shankar_mission

- Predictive Analysis Overview 2013
- Grape Leaf Disease Detection Using K-means Clustering Algorithm
- A Robust Initialization Algorithm for K-Means Clustering in Power Distribution Networks With PMU-based Adaptive Protection System
- False News Detection
- MachineLearning-Lecture12
- Kmeans Seg Color
- lkSMC04
- Detection of Topic Trend and Removal of Vulgar Words From User Data Streams
- A Unified Framework for Name Disambiguation
- Adaptive Clustering
- Lesson8 Clustering
- Spherical k-means clustering
- k-means
- DWM_Experiment5_E059
- Anatomy of BFR clustering algorithm
- SQL clustering
- SCRF2010 07.Kwangwon SCRF10 Metrel
- Paper klasdj lka sakldj
- A Real Time Approach for Indian Road Analysis using Image Processing and Computer Vision
- Design of Protocol for Cluster Based Routing In VANET Using Fire Fly Algorithm

You are on page 1of 3

Conceptually, the K-means algorithm:

3. Recalculates the centroids as the average of all data points in a cluster (i.e., the

centroids are p-length mean vectors, where p is the number of variables)

5. Continues steps 3 and 4 until the observations are not reassigned or the

maximum number of iterations (R uses 10 as a default) is reached.

R uses an efficient algorithm by Hartigan and Wong (1979) that partitions the

observations into k groups such that the sum of squares of the observations to their

assigned cluster centers is a minimum. This means that in steps 2 and 4, each

observation is assigned to the cluster with the smallest value of:

Where k is the cluster,xij is the value of the jth variable for the ith observation, and xkj-

bar is the mean of the jth variable for the kth cluster.

K-means clustering can handle larger datasets than hierarchical cluster approaches.

Additionally, observations are not permanently committed to a cluster. They are

moved when doing so improves the overall solution. However, the use of means

implies that all variables must be continuous and the approach can be severely

affected by outliers. They also perform poorly in the presence of non-convex (e.g., U-

shaped) clusters.

dataset (matrix or data frame) and centers is the number of clusters to extract. The

function returns the cluster memberships, centroids, sums of squares (within,

between, total), and cluster sizes.

Since K-means cluster analysis starts with k randomly chosen centroids, a different

solution can be obtained each time the function is invoked. Use

the set.seed() function to guarantee that the results are reproducible. Additionally,

this clustering approach can be sensitive to the initial selection of centroids.

The kmeans() function has an nstart option that attempts multiple initial

configurations and reports on the best one. For example, adding nstart=25 will

generate 25 initial configurations. This approach is often recommended.

clusters to extract be specified in advance. Again, the NbClust package can be used as

a guide. Additionally, a plot of the total within-groups sums of squares against the

number of clusters in a K-means solution can be helpful. A bend in the graph can

suggest the appropriate number of clusters. The graph can be produced by the

following function.

wss <- (nrow(data)-1)*sum(apply(data,2,var))

for (i in 2:nc){

set.seed(seed)

wss[i] <- sum(kmeans(data, centers=i)$withinss)}

plot(1:nc, wss, type="b", xlab="Number of Clusters",

ylab="Within groups sum of squares")}

The data parameter is the numeric dataset to be analyzed, nc is the maximum

number of clusters to consider, and seed is a random number seed.

analyzed. The data originally come from the UCI Machine Learning Repository

(http://www.ics.uci.edu/~mlearn/MLRepository.html) but we will access it via

the rattle package. A K-means cluster analysis of the data is provided in listing 1.

> head(wine)

1 1 14.23 1.71 2.43 15.6 127 2.80 3.06

2 1 13.20 1.78 2.14 11.2 100 2.65 2.76

3 1 13.16 2.36 2.67 18.6 101 2.80 3.24

4 1 14.37 1.95 2.50 16.8 113 3.85 3.49

5 1 13.24 2.59 2.87 21.0 118 2.80 2.69

6 1 14.20 1.76 2.45 15.2 112 3.27 3.39

1 0.28 2.29 5.64 1.04 3.92 1065

2 0.26 1.28 4.38 1.05 3.40 1050

3 0.30 2.81 5.68 1.03 3.17 1185

4 0.24 2.18 7.80 0.86 3.45 1480

5 0.39 1.82 4.32 1.04 2.93 735

6 0.34 1.97 6.75 1.05 2.85 1450

> wssplot(df) #2

> library(NbClust)

> set.seed(1234)

> nc <- NbClust(df, min.nc=2, max.nc=15, method="kmeans")

> table(nc$Best.n[1,])

0 2 3 8 13 14 15

2 3 14 1 2 1 1

> barplot(table(nc$Best.n[1,]),

xlab="Numer of Clusters", ylab="Number of Criteria",

main="Number of Clusters Chosen by 26 Criteria")

> set.seed(1234)

> fit.km <- kmeans(df, 3, nstart=25) #3

> fit.km$size

[1] 62 65 51

> fit.km$centers

1 0.83 -0.30 0.36 -0.61 0.576 0.883 0.975 -0.561

2 -0.92 -0.39 -0.49 0.17 -0.490 -0.076 0.021 -0.033

3 0.16 0.87 0.19 0.52 -0.075 -0.977 -1.212 0.724

Proanthocyanins Color Hue Dilution Proline

1 0.579 0.17 0.47 0.78 1.12

2 0.058 -0.90 0.46 0.27 -0.75

3 -0.778 0.94 -1.16 -1.29 -0.41

1 1 14 1.8 2.4 17 106 2.8 3.0

2 2 12 1.6 2.2 20 88 2.2 2.0

3 3 13 3.3 2.4 21 97 1.6 0.7

Nonflavanoids Proanthocyanins Color Hue Dilution Proline

1 0.29 1.9 5.4 1.07 3.2 1072

2 0.35 1.6 2.9 1.04 2.8 495

3 0.47 1.1 7.3 0.67 1.7 620

#1 standardize data

#2 determine number of clusters

#3 K-means cluster analysis

Since the variables vary in range, they are standardized prior to clustering (#1). Next,

the number of clusters is determined using the wwsplot() and NbClust()functions

(#2). Figure 1 indicates that there is a distinct drop in within groups sum of squares

when moving from 1 to 3 clusters. After three clusters, this decrease drops off,

suggesting that a 3-cluster solution may be a good fit to the data. In figure 2, 14 of 24

criteria provided by the NbClust package suggest a 3-cluster solution. Note that not

all 30 criteria can be calculated for every dataset.

A final cluster solution is obtained with kmeans() function and the cluster centroids

are printed (#3). Since the centroids provided by the function are based on

standardized data, the aggregate() function is used along with the cluster

memberships to determine variable means for each cluster in the original metric.

- Predictive Analysis Overview 2013Uploaded bySivaramakrishna Alpuri
- Grape Leaf Disease Detection Using K-means Clustering AlgorithmUploaded byIRJET Journal
- A Robust Initialization Algorithm for K-Means Clustering in Power Distribution Networks With PMU-based Adaptive Protection SystemUploaded byMiguel Quispe
- False News DetectionUploaded byJournalNX - a Multidisciplinary Peer Reviewed Journal
- MachineLearning-Lecture12Uploaded byJahangir Alam Mithu
- Kmeans Seg ColorUploaded bynithal01
- lkSMC04Uploaded byrex
- Detection of Topic Trend and Removal of Vulgar Words From User Data StreamsUploaded byEditor IJRITCC
- A Unified Framework for Name DisambiguationUploaded bymachinelearner
- Adaptive ClusteringUploaded byAjay Shukla
- Lesson8 ClusteringUploaded byprabhudeen
- Spherical k-means clusteringUploaded byLei Huang
- k-meansUploaded byNaresh M Ranganatharao
- DWM_Experiment5_E059Uploaded byShubham Gupta
- Anatomy of BFR clustering algorithmUploaded byGautam Goswami
- SQL clusteringUploaded byRyan
- SCRF2010 07.Kwangwon SCRF10 MetrelUploaded byKanoksakOmkaew
- Paper klasdj lka sakldjUploaded bysaqib49
- A Real Time Approach for Indian Road Analysis using Image Processing and Computer VisionUploaded byIOSRjournal
- Design of Protocol for Cluster Based Routing In VANET Using Fire Fly AlgorithmUploaded byInternational Journal for Scientific Research and Development - IJSRD
- PUT 2012-13.docUploaded byAshish Jain
- Fuzzy c MeansUploaded byyanuar.wi7417
- An Attribute Oriented Stimulate Algorithm For Detecting and Mapping Crime Hot SpotsUploaded byEditor IJACSA
- A Study of Dengue Infection Segmentation, Feature Extraction and ClassificationUploaded byAnonymous lPvvgiQjR
- EXPLORING PEER-TO-PEER DATA MININGUploaded byCS & IT
- Maths PaperUploaded byAli Mon
- Update - 26 Jan 2009Uploaded byldkhang
- Report on Traffic Density Based Discovery of Hot Routes in Road NetworksUploaded byElena Trasmil
- Paper-3_A Novel Approach to Detect Anomalies With Unsupervised LearningUploaded byRachel Wheeler
- 10.1016 S0167 9236(02)00102 1 Automatic Discovery of Similarity Relationships Through Web MiningUploaded bysaman

- AVE and Composite Reliability CalculatorUploaded byshankar_mission
- Fit IndicesUploaded byshankar_mission
- WS Brochure 19Uploaded byshankar_mission
- Poster NewUploaded byshankar_mission
- Multiple Regression Analysis Using SPSS Statistics Step by StepUploaded byshankar_mission
- Key. 1docx.docUploaded byshankar_mission
- R BrochureUploaded byshankar_mission
- R BrochureUploaded byshankar_mission
- Mba in Textile ManagementUploaded byshankar_mission
- OB Notes III TestUploaded byshankar_mission
- Unit -3., Labour WelfareUploaded byshankar_mission
- Jayamaha 2010 AbstractUploaded byshankar_mission
- lec3Uploaded byshankar_mission
- Single BrochureUploaded byshankar_mission
- Workfamily BalanceUploaded byshankar_mission
- Poltics QuestionnaireUploaded byshankar_mission
- Anathandavapuram Sivan TempleUploaded byshankar_mission
- Fdp ContactsUploaded byshankar_mission
- Multivariate Analysis WorkshopUploaded byshankar_mission
- Hindi AlphabetUploaded byGhulam Mustaqeem
- 18 Entreprenuerial SpiritUploaded byshankar_mission
- ActivityUploaded byshankar_mission
- Retail Customer ServiceUploaded byshankar_mission
- Best 5 Models for EvaluationUploaded byshankar_mission
- HindiUploaded bySohail Mirza
- A secUploaded byshankar_mission
- Brochure for College.pdfUploaded byshankar_mission
- BasUploaded byshankar_mission
- BasUploaded byshankar_mission

- Title PagesUploaded byMegan Joy Zaratan
- Survey Brazilian Soccer Players Dental TraumaUploaded byh20pologt
- Open Nucleus Breeding SystemsUploaded byMisbakhul Badri
- Composite ColumnUploaded bybsitler
- chew and girardi 2008.pdfUploaded byAlice Lin
- Scheme of Examination of Pre-PhDUploaded byskdonn
- Exercise Ch 06Uploaded byNurshuhada Nordin
- Marine Geology of IndonesiaUploaded byarditya bayu
- Paper Cognición(Inglés)Uploaded byElizabeth Soutullo
- Bionomil DistributionUploaded byShamsUlislam
- A Study on Consumer Buying Behavior of cUploaded byAyush
- Sat Hi ShUploaded byselmuthusamy
- brandtracker-120729134838-phpapp01Uploaded bySarthak Mukkar
- Jurnal Competitive AdvantageUploaded byWendy Newman
- Research SurveyUploaded byPuskar Bist
- Humanitarian Quality Assurance – Nepal: Evaluation of Oxfam’s response to the Nepal 2015 earthquakeUploaded byOxfam
- Abdi Least Squares 06 PrettyUploaded byMohd Faiz Mohd Zin
- [Ruth_Herbert]_Everyday_Music_Listening_Absorption.pdfUploaded byUnsubstar
- Crown Margin Positioning on Surrounding Periodontal TissuesUploaded byNada Rania
- lec15Uploaded byRahul Sharma
- tensor calculusUploaded byPrasad Ravichandran
- training & development - performance appraisalUploaded byveerabalaji
- Performance Management PPT 2019Uploaded byKavita Singh
- Multiple linear regressionUploaded byDavide Rossetto
- Kotler-Pom15-Im-04.docxUploaded byCharles MK Chan
- Six 1Uploaded bySam
- 10_Abednego_Ogunlana_PPP_Indonesia.pdfUploaded byTheodorik Rizal Ambarita
- synthesis paper henna dattaUploaded byapi-409551915
- Women Empowerment in BangladeshUploaded bycoolimran76
- 1.Geological MapsUploaded byLương Chí Minh