You are on page 1of 5

International Journal of Advances in Computing & Communications Volume * No.

*, ___________ 2013

A COMPARATIVE ANALYSIS BETWEEN K-MEANS AND FUZZY C-MEANS CLUSTERING ALGORITHMS

MOHITA BANSAL (mohita.bansal01@gmail.com)

MEENAKSHI CHAUDHARY (meenakshi.chaudhary30@gmail.com)

SWASTI SHARMA (sharma.swasti04@gmail.com)

ASSISTANCE BY: RAHUL SHARMA(rahul.gla@gmail.com)

Abstract: Clustering is the partitioning of data into groups of similar objects. Clustering is one of the methods used for segmentation. Segmentation of an image entails the division or separation of the image into groups which consists of regions of similar attribute and dissimilar attributes. Several clustering methods and numerous clustering algorithms are available in existing software packages and new ones frequently appear in the literature. These methods and algorithms vary depending on how the similarity between observations is defined or on other assumptions about shapes of clusters, distributions of variables, etc. The objective of this paper is to study and compare different data clustering algorithm. The aim of this paper is to compare the K-means and Fuzzy C-means clustering. KEYWORDS: K-mean Clustering and Fuzzy C-mean
clustering.

Clustering has been used in different areas like engineering, data mining, medicine and biology. Clustering is also useful in various naturally associating, decision-making ,exploratory pattern-analysis, and machine-learning situations, including document retrieval, image segmentation, and pattern classification.

1.INTRODUCTION
Clustering analysis is the system that collects patterns and form clusters on the basis of only the information found in the data that describes the pattern and their relationship but they should have similar feature or aspect. The patterns that form clusters are similar within then to other patterns belonging to a different cluster . The greater the homogeneity within a cluster and greater the difference between the clusters, the distinct and better is the cluster. Clustering generates groups of persons, products or event which can be used to determine managerial strategy, or are commonly the target of further analysis. Clustering analysis deals with finding a structure in a collection of unlabeled data. It is important to understand the difference between clustering and discriminant analysis. The loose definition of clustering can be said as it is the process in which the objects which has similar characteristics or behaviour are grouped together and the objects which have dissimilar behaviour(not included in any group) are called outliers.

Fig 1: Patterns of

different data items(top), clustering of patterns(bottom)

COMPONENTS OF CLUSTERING:
1.

Representation of the patterns which involves feature extraction(It is the use of one or more transformations of the input features to produce new salient features) and feature 79|P a g e

www.ijacc.org

MohitaBansal,MeenakshiChaudhary,SwastiSharma

International Journal of Advances in Computing & Communications Volume * No.*, ___________ 2013


selection(It is the process of identify the most effective subset of the features which are existing from the beginning to use in clustering) . Pattern proximity is used to calculate the distance function between the patterns. Distance measure such as Euclidean distance which is used to show the dissimilarity in patterns. Grouping is the process of placing similar patterns acting together.it can be done through many ways like hierarichal , partitioning ,agglometric clustering and many additional techniques. Data abstraction is the process of extracting the compact representation of patterns. Output results in the formation of clusters. dimensional data points be A={a1,a2,- - -, an} and B={b1,b2,- -,bn}be described as:

2.

D(A,B)= Where D is the Euclidean distance. The k means methods aims to minimize the sum of squared distances between all points and the cluster centre. It is well suited to generating global clusters. The K-Means method is represented in number, unsupervised, non-deterministic and iterative.

3.

4. 5.

(a) Iteration 1

Fig 2: Components of clustering 2. K-MEAN


The basic idea of k-means clustering is that clusters of patterns with the same target category are recognized and forcasts for new data items are made by assuming, they are of the same type as the nearest cluster centre. It can also be described as k centroids, one for each cluster. Firstly any random pattern is selected. Then the distance of the patterns to the centroid of the groups is compare. The group which is closest to the pattern, to that, the pattern is merged. This process is carried until there is no such pattern left which does not belong to any of the groups. When this process is completed we re-compute the position of k new centroids. After this, a new binding has to be done between the same pattern and the closest new centroid. This process is repeated until we may notice that there is no change in the location of the k centroids (centroids do not move). In k-means clustering the distance between patterns and the centroid are measured in the terms of Euclidean distance. The Euclidean distance between the two multi-

(b)Iteration 2

80|P a g e

www.ijacc.org

MohitaBansal,MeenakshiChaudhary,SwastiSharma

International Journal of Advances in Computing & Communications Volume * No.*, ___________ 2013


Let as consider there are n sample feature vectors a1,a2,....,an which belong to the same class. They all lie into k concise clusters, where k<n. Suppose xi be the mean of vectors in cluster i. If the clusters are well-apart, a minimum distance classifier is used to apart them, and now we can state that a is in cluster i if || a - xi || is the minimum of all the k distances. Then the procedure for finding the k means is as follows: o o o Initially guess the means x1,x2,.....xk , until no change is found in the mean. Calculate the appropriate mean and use it to classify the samples into clusters. Let there be loop for i which is from 1to k. Replace xi with the mean of samples for cluster i. The for loop ends. The process of until ends.

(c)Iteration 3 4. FUZZY C-MEAN


One of the most widely used fuzzy clustering algorithms is the Fuzzy C-Means (FCM) Algorithm.Fuzzy C-Means, is one of the earliest and most popular fuzzy clustering algorithms. It is also referred to as soft clustering. As the k-mean algorithm we have to pre-define c centroids, one for each cluster and the centroid is weighted by the degree of objects corresponding to the cluster. The clusters are modified at each step and for each object a degree of membership to each of the c clusters is estimated. The distance of the object to the centroid of the groups is compare on the basis of a criterion. The comparison is done using a weighted average that has a degree of membership of the object to each cluster. Now, a list of the estimated degree of membership of the object to each of the c clusters is established. The clusters which has the highest degree of membership, to that the cluster the object is assigned. The process repeats again and again until the criterion is fulfilled.

(d)Iteration 4 Fig 3: Using k means algorithm to find three data samples. 3.ALGORITHM CLUSTERING OF K-MEANS

The fuzzy c mean algorithm tries to partition a limited collection of m elements A = {a1,a2,....,am} into a group of c fuzzy clusters on the basis of some criterion. Let a finite set of data, the fcm algorithm returns series of c clusters centres C = {c1,...,cc}and a matrix R .

STEP 1: Specify the number of clusters(k in k-means). STEP 2: For each cluster select a centroid. STEP 3: Assign each object to the group (having similar
behaviour) based on the closest or nearest centroid. STEP 4: Recalculate the position of the new k centroids. STEP 5: Repeat the above two steps until the centroids no longer change their location or position. The k means algorithm necessarily does not find accurate arrangement according to the sum of square distance function minimum. It is also considerably sensitive to the initial randomly selected centroids. The k-means algorithm can be run multiple times to reduce this effect. The k mean algorithm can be better understood with the help of a simple example:

U = u i, j [0,1], i = 1,.,m, j = 1,,c


where each element uij tells the degree to which element ai belongs to cluster cj . The FCM aims to minimize an sum of square distance function. The function can be described as:

It simply differs from the k-means sum of square function by the addition of uij and the fuzzifier n. The fuzzifier n defines the level of cluster fuzziness. A large

81|P a g e

www.ijacc.org

MohitaBansal,MeenakshiChaudhary,SwastiSharma

International Journal of Advances in Computing & Communications Volume * No.*, ___________ 2013


n results in smaller uij and hence, fuzzier clusters. When the limit n = 1, uij converge to 0 or 1, which involves a partitioning. If not then compute again by taking cetroids.
The paper compares k means and fuzzy c means clustering, which are very similar in approaches. After analysing the algorithms we have come to the conclusion that: All the algorithm have some ambiguity in some data when clustered. K means and fuzzy c mean clustering algorithm are recommended for huge data set. K means and fuzzy c mean is very sensitive to noisy in dataset.this noise makes it difficult for the algorithms to cluster an object into its suitable cluster.

The difference is that, K-means clustering produces fairly higher accuracy and requires less computation. C-means clustering produces close results to K-means clustering, yet it requires more computation time than K-means because of the fuzzy measures calculations involved in the algorithm. Fuzzy-C means will tend to run slower than K means, since it's actually doing more work. Each point is evaluated with each cluster, and more operations are involved in each evaluation. K-Means just needs to do a distance calculation, whereas fuzzy c means needs to do a full inversedistance weighting. Fuzzy-C Means clustering, each point has a weighting associated with a particular cluster,so a point doesn't sit "in a cluster" as much as has a weak or strong association to the cluster.

Fig 4:Fuzzy c mean clustering

5.ALGORITHM CLUSTERING

OF

FUZZY

C-MEANS

STEP 1: Specify the number of clusters(c). STEP 2: Assign random degree of membership to each
point in a cluster. STEP 3: Compute the cluster centroids. STEP 4:Group the object on the basis of the criterion. STEP 5: Compute the Euclidean distance. STEP 6: Assign object to group which has highest degree of membership. STEP 7: Repeat until criterion is met.

REFERENCES
1. T.Chandrasekhar, K.Thangavel and E.Elayaraja (Research Scholar, Bharathiar university, Tamilnadu, India) Performance Analysis of Enhanced Clustering Algorithm for Gene Expression Data 2. A.K. JAIN(Michigan State University),M.N. MURTY (Indian Institute of Science),P.J. FLYNN(The Ohio State University),Data Clustering: A Review 3. Mrs. Bharati R.Jipkate and Dr.Mrs.V.V.Gohokar(SSGMCE, Shegaon, MaharashtraIndia),A Comparative Analysis of Fuzzy C-Means Clustering and K Means Clustering Algorithms 4. Satish Garla, Goutam Chakraborty, (Oklahoma State University, Stillwater, OK, US),Gary Gaeth, (University of Iowa, Iowa City, Iowa, US) Comparison of K-means, Normal Mixtures and Probabilistic-D Clustering for B2B Segmentation using Customers Perceptions. 5. Tapas Kanungo(Senior Member, IEEE), David M. Mount(Member, IEEE),Nathan S. Netanyahu(Member, IEEE), Christine D. Piatko, Ruth Silverman, and Angela Y. Wu(Senior Member, IEEE)An Efficient k-Means Clustering Algorithm:Analysis and Implementation. 6. K.Velusamy(Department of Computer Science, KSR College of Arts and Science, Tiruchengodu, Tamilnadu, India), R.Manavalan(Department of Computer Science, KSR College of Arts and Science,Tiruchengodu, Tamilnadu,

Fig 5: Multi-dimensional fuzzy c mean clustering 6.RESULT

82|P a g e

www.ijacc.org

MohitaBansal,MeenakshiChaudhary,SwastiSharma

International Journal of Advances in Computing & Communications Volume * No.*, ___________ 2013


India), Performance Analysis of Unsupervised Classification Based on Optimization.

83|P a g e

www.ijacc.org

MohitaBansal,MeenakshiChaudhary,SwastiSharma

You might also like