Professional Documents
Culture Documents
*, ___________ 2013
Abstract: Clustering is the partitioning of data into groups of similar objects. Clustering is one of the methods used for segmentation. Segmentation of an image entails the division or separation of the image into groups which consists of regions of similar attribute and dissimilar attributes. Several clustering methods and numerous clustering algorithms are available in existing software packages and new ones frequently appear in the literature. These methods and algorithms vary depending on how the similarity between observations is defined or on other assumptions about shapes of clusters, distributions of variables, etc. The objective of this paper is to study and compare different data clustering algorithm. The aim of this paper is to compare the K-means and Fuzzy C-means clustering. KEYWORDS: K-mean Clustering and Fuzzy C-mean
clustering.
Clustering has been used in different areas like engineering, data mining, medicine and biology. Clustering is also useful in various naturally associating, decision-making ,exploratory pattern-analysis, and machine-learning situations, including document retrieval, image segmentation, and pattern classification.
1.INTRODUCTION
Clustering analysis is the system that collects patterns and form clusters on the basis of only the information found in the data that describes the pattern and their relationship but they should have similar feature or aspect. The patterns that form clusters are similar within then to other patterns belonging to a different cluster . The greater the homogeneity within a cluster and greater the difference between the clusters, the distinct and better is the cluster. Clustering generates groups of persons, products or event which can be used to determine managerial strategy, or are commonly the target of further analysis. Clustering analysis deals with finding a structure in a collection of unlabeled data. It is important to understand the difference between clustering and discriminant analysis. The loose definition of clustering can be said as it is the process in which the objects which has similar characteristics or behaviour are grouped together and the objects which have dissimilar behaviour(not included in any group) are called outliers.
Fig 1: Patterns of
COMPONENTS OF CLUSTERING:
1.
Representation of the patterns which involves feature extraction(It is the use of one or more transformations of the input features to produce new salient features) and feature 79|P a g e
www.ijacc.org
MohitaBansal,MeenakshiChaudhary,SwastiSharma
International Journal of Advances in Computing & Communications Volume * No.*, ___________ 2013
selection(It is the process of identify the most effective subset of the features which are existing from the beginning to use in clustering) . Pattern proximity is used to calculate the distance function between the patterns. Distance measure such as Euclidean distance which is used to show the dissimilarity in patterns. Grouping is the process of placing similar patterns acting together.it can be done through many ways like hierarichal , partitioning ,agglometric clustering and many additional techniques. Data abstraction is the process of extracting the compact representation of patterns. Output results in the formation of clusters. dimensional data points be A={a1,a2,- - -, an} and B={b1,b2,- -,bn}be described as:
2.
D(A,B)= Where D is the Euclidean distance. The k means methods aims to minimize the sum of squared distances between all points and the cluster centre. It is well suited to generating global clusters. The K-Means method is represented in number, unsupervised, non-deterministic and iterative.
3.
4. 5.
(a) Iteration 1
(b)Iteration 2
80|P a g e
www.ijacc.org
MohitaBansal,MeenakshiChaudhary,SwastiSharma
International Journal of Advances in Computing & Communications Volume * No.*, ___________ 2013
Let as consider there are n sample feature vectors a1,a2,....,an which belong to the same class. They all lie into k concise clusters, where k<n. Suppose xi be the mean of vectors in cluster i. If the clusters are well-apart, a minimum distance classifier is used to apart them, and now we can state that a is in cluster i if || a - xi || is the minimum of all the k distances. Then the procedure for finding the k means is as follows: o o o Initially guess the means x1,x2,.....xk , until no change is found in the mean. Calculate the appropriate mean and use it to classify the samples into clusters. Let there be loop for i which is from 1to k. Replace xi with the mean of samples for cluster i. The for loop ends. The process of until ends.
(d)Iteration 4 Fig 3: Using k means algorithm to find three data samples. 3.ALGORITHM CLUSTERING OF K-MEANS
The fuzzy c mean algorithm tries to partition a limited collection of m elements A = {a1,a2,....,am} into a group of c fuzzy clusters on the basis of some criterion. Let a finite set of data, the fcm algorithm returns series of c clusters centres C = {c1,...,cc}and a matrix R .
STEP 1: Specify the number of clusters(k in k-means). STEP 2: For each cluster select a centroid. STEP 3: Assign each object to the group (having similar
behaviour) based on the closest or nearest centroid. STEP 4: Recalculate the position of the new k centroids. STEP 5: Repeat the above two steps until the centroids no longer change their location or position. The k means algorithm necessarily does not find accurate arrangement according to the sum of square distance function minimum. It is also considerably sensitive to the initial randomly selected centroids. The k-means algorithm can be run multiple times to reduce this effect. The k mean algorithm can be better understood with the help of a simple example:
It simply differs from the k-means sum of square function by the addition of uij and the fuzzifier n. The fuzzifier n defines the level of cluster fuzziness. A large
81|P a g e
www.ijacc.org
MohitaBansal,MeenakshiChaudhary,SwastiSharma
International Journal of Advances in Computing & Communications Volume * No.*, ___________ 2013
n results in smaller uij and hence, fuzzier clusters. When the limit n = 1, uij converge to 0 or 1, which involves a partitioning. If not then compute again by taking cetroids.
The paper compares k means and fuzzy c means clustering, which are very similar in approaches. After analysing the algorithms we have come to the conclusion that: All the algorithm have some ambiguity in some data when clustered. K means and fuzzy c mean clustering algorithm are recommended for huge data set. K means and fuzzy c mean is very sensitive to noisy in dataset.this noise makes it difficult for the algorithms to cluster an object into its suitable cluster.
The difference is that, K-means clustering produces fairly higher accuracy and requires less computation. C-means clustering produces close results to K-means clustering, yet it requires more computation time than K-means because of the fuzzy measures calculations involved in the algorithm. Fuzzy-C means will tend to run slower than K means, since it's actually doing more work. Each point is evaluated with each cluster, and more operations are involved in each evaluation. K-Means just needs to do a distance calculation, whereas fuzzy c means needs to do a full inversedistance weighting. Fuzzy-C Means clustering, each point has a weighting associated with a particular cluster,so a point doesn't sit "in a cluster" as much as has a weak or strong association to the cluster.
5.ALGORITHM CLUSTERING
OF
FUZZY
C-MEANS
STEP 1: Specify the number of clusters(c). STEP 2: Assign random degree of membership to each
point in a cluster. STEP 3: Compute the cluster centroids. STEP 4:Group the object on the basis of the criterion. STEP 5: Compute the Euclidean distance. STEP 6: Assign object to group which has highest degree of membership. STEP 7: Repeat until criterion is met.
REFERENCES
1. T.Chandrasekhar, K.Thangavel and E.Elayaraja (Research Scholar, Bharathiar university, Tamilnadu, India) Performance Analysis of Enhanced Clustering Algorithm for Gene Expression Data 2. A.K. JAIN(Michigan State University),M.N. MURTY (Indian Institute of Science),P.J. FLYNN(The Ohio State University),Data Clustering: A Review 3. Mrs. Bharati R.Jipkate and Dr.Mrs.V.V.Gohokar(SSGMCE, Shegaon, MaharashtraIndia),A Comparative Analysis of Fuzzy C-Means Clustering and K Means Clustering Algorithms 4. Satish Garla, Goutam Chakraborty, (Oklahoma State University, Stillwater, OK, US),Gary Gaeth, (University of Iowa, Iowa City, Iowa, US) Comparison of K-means, Normal Mixtures and Probabilistic-D Clustering for B2B Segmentation using Customers Perceptions. 5. Tapas Kanungo(Senior Member, IEEE), David M. Mount(Member, IEEE),Nathan S. Netanyahu(Member, IEEE), Christine D. Piatko, Ruth Silverman, and Angela Y. Wu(Senior Member, IEEE)An Efficient k-Means Clustering Algorithm:Analysis and Implementation. 6. K.Velusamy(Department of Computer Science, KSR College of Arts and Science, Tiruchengodu, Tamilnadu, India), R.Manavalan(Department of Computer Science, KSR College of Arts and Science,Tiruchengodu, Tamilnadu,
82|P a g e
www.ijacc.org
MohitaBansal,MeenakshiChaudhary,SwastiSharma
International Journal of Advances in Computing & Communications Volume * No.*, ___________ 2013
India), Performance Analysis of Unsupervised Classification Based on Optimization.
83|P a g e
www.ijacc.org
MohitaBansal,MeenakshiChaudhary,SwastiSharma