Dissertation
Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Retrieval machine

Research on Text Clustering Based on Hownet

Author ZhangLong
Tutor ChaiXin
School Hebei University of Technology
Course Computer technology
Keywords text clustering vector space model hownet textual similarity
CLC TP391.3
Type Master's thesis
Year 2012
Downloads 21
Quotes 0
Download Dissertation

K-Means algorithm is a classical algorithm of data mining technology, and it has the advantage of brief form and low time and space cost. It is also used widely in text mining. The paper researches on the key technology and algorithm in text clustering and puts forward a new method of calculating the similarity of texts based on hownet and improves the K-Means algorithm.The main work of the paper is to explore the effect of three text similarity calculating methods on K-Means algorithm. Using the classical vector space model based text similarity calculating method, hownet based text similarity calculating method and position information involved text similarity calculating method, the paper completes K-Means algorithm. To define the hownet based text similarity calculating method, the paper put forward a new way of generating vector space. It use the words of one text to generate a vector for the text,thus, the dimension of the vector equals to the number of words in the only text but not the number of words in all the text set. In this method, the high dimension and sparsity is reduced. The paper also talks something about the relation between the space and Euclid space. To define the position information involved text similarity calculating method, The paper also put forward that the similarity of two words should be decided by the words meaning similarity and position similarity. The paper also explore the method that how to correct the similarity of two words.

Related Dissertations
More Dissertations