Research and Improvement on K-Means Clustering Algorithm
|School||Changsha University of Science and Technology|
|Course||Applied Computer Technology|
|Keywords||Clustering algorithms K-means algorithm Differential evolution algorithm|
With the rapid development of computer technology, people face all kinds of data, such as text data, image data, audio data, video data and so on. The quantity of these kinds of data is very large. How to quickly and effectively gain implicit and valuable information from these mass data has been a problem that has got much attention and should been solved urgently. Data mining (DM) has appeared in this situation. It has provided lots of efficient methods and tools on solving that problem for people. The Clustering analysis is one important method of them. It is an important part of data mining. With the gradually intensive research on clustering analysis these years, its importance has been recognized by people more and more. Clustering analysis technology has gained plentiful and substantial achievements in both theory and practice during recent years. At present, clustering analysis has been widely applied in machine learning, pattern recognition, image processing, text classification, marketing, statistical science and lots of others fields.According to the difference of data type, clustering purpose and application, we can divide existing clustering algorithms into partition algorithm, hierarchical algorithm, grid-based algorithm, density-based algorithm and model-based algorithm. One of the most mature and classical clustering algorithms is k-means clustering algorithm. It is a partition algorithm. This paper presents deeply research and analysis on merits and defects of k-means clustering algorithm. This paper has provided a improvement on k-means clustering algorithm according to the feature that the results of k-means clustering algorithm liable to be effected by initial centers. Following are the main works have been done:1. According to the defect that K-means clustering algorithm is dependent on the initial clustering centers selection, this paper put forward a new initial clustering centers selection method of k-means algorithm. The experiments showed that this method has effectively solved the problem that the clustering result is always unstable due to the initial clustering centers overly close to each other and has improved effectiveness and stability of the clustering result.2. Aiming to the disadvantages of k-means clustering algorithm that it is sensitive to the initial centers selection and easily falls into local optimal solution, differential evolution algorithm whose global optimization ability is strong was introduced into clustering in this paper. This paper put forward an improved differential evolution algorithm and made it combined with k-means clustering algorithm at the same time. This method has solved initial centers optimization problem of k-means clustering algorithm well. The experiments showed that the method has effectively improved clustering quality and convergence speed.