Research on the Attributes Selection Visualization Techniques and Clustering Algorithms
|School||Taiyuan University of Technology|
|Course||Computer Science and Technology|
|Keywords||data visualization cluster analysis improved k-means algorithm outlier detection|
Visualization technology converts data to images, animations, showed to the user in a intuitive way, the user can view the characteristics and correlation of the property, as well as the data distribution through the images and animations. Thus the user can get a better understanding of the information hidden in the data to support the decision. As the data size increases, dimensionality reduction and attribute selection of visualization techniques have great significance now.Cluster analysis is an important field of machine learning, this paper takes clustering algorithm as the main line of study, using visualization techniques in the process of pretreatment and result analysis. The main work this paper has done is shown as follows:(1) This paper selected FastMap algorithm and MDS algorithm to map high-dimensional data, then through simulation experiments to compare the mapping features of two algorithms. This paper chose properties visualization algorithm:connecting vector diagram, which has a single point change and data boundaries are not obvious, in order to resolve the problem, this paper introduced multiple point changes to connection vector tree, and in the simulated data sets and UCI data sets on an experiment to test the visualization techniques feasibility for attribute selection.(2) This paper analyzed the kNN outlier detection algorithm, which has a defect that parameters setting effects the result badly. With the introduction of a threshold radius and intensity thresholds, this paper proposed the nearest neighbor distance difference outlier detection algorithm. Experiments showed that the improved algorithm reduced the affect of parameters, the user and adjust the intensity threshold to determine the strength of isolated points.(3) This paper proposed an improved k-means algorithm based on the shortest distance of hierarchical class to resolve the defect that different initial center affected the clustering results. The first stage was outlier detection based on nearest neighbor distance algorithm, the second stage is merging the subcategories with shortest distance to get the cluster center. Experimental results showed that by optimizing the initial center, k-means clustering algorithm obtained a stable result, and had a higher accuracy rate.This paper integrated the variety of algorithms in the Matlab environment to build the "attribute selection and clustering visualization experiment platform", including different parts of showing data, selecting algorithms and processing the results, as well as presenting the process of k-means algorithm clustering.