Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems

The Study of Clustering Algorithm Based on Density

Author LiWeiXiong
Tutor TanJianHao
School Hunan University
Course Control Science and Engineering
Keywords Data Mining Cluster analysis Density Clustering Grid clustering Intrusion Detection
CLC TP311.13
Type Master's thesis
Year 2010
Downloads 306
Quotes 1
Download Dissertation

Density-based clustering method occupies a very important position in the clustering analysis technology, finance, marketing, information retrieval, information filtering, widely used in various fields of scientific observation and Engineering, is the focus of research in the cluster analysis. Density-based clustering algorithm DBSCAN (Density-Based Spatial Clustering of Applications with Noise) based on the improved algorithm. Based on the density grid clustering method is based on density clustering method based on the integration of grid clustering method, combining the advantages of both clustering methods, this paper presents an improved density grid-based clustering method. In this paper, research in the following aspects: (1) First, the research and development of data mining technology, and cluster analysis techniques are reviewed, and expounded the basic principle of cluster analysis and data structure details of the commonly used poly class analysis technology, and, optionally, the data preprocessing method. (2) for the the DBSCAN parameter sensitivity and difficult to obtain better clustering effect of uneven density distribution data sets, this paper proposes a regional proportion of improved algorithm. The algorithm uses the point of the region located to measure the density of dots, and to define candidate core point to improve the search efficiency of the cluster. The algorithm used in the dataset, density-based outlier detection method LOF (local outlier factor) to detect outliers. (3) clustering method based on the density of the grid with grid-based method for clustering time independent of the size of the data set characteristics, this paper presents an improved clustering algorithm based on grid density. Algorithm through the function of the density data mapped to the grid structure, using the threshold segmentation processing on the grid, the binarization grid clustering density connected regions. The method not only has the advantage of grid-based clustering method in the clustering time, but also on any shape data clustering. (4) intrusion detection model based on a common building intrusion detection model based on density clustering and applied to intrusion Knowledge Base clustering method based on the area ratio of the training. The experimental results verify the validity of the application based on the area proportion clustering algorithm. Experiments show that DBSCAN-based clustering algorithm based on the area ratio of the new measure density function and regional proportion thinking on the density of the uneven distribution of data sets clustering better than DBSCAN clustering effect parameters Lu Stick also improved, to achieve the intended purpose. Data of arbitrary shape clustering, grid-based density clustering algorithm, and clustering time independent of the size of the data set, and is a good complement to the density-based clustering algorithm.

Related Dissertations
More Dissertations