Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing

The Study of XML Documents Clustering Based on the Semantic Tag Tree

Author ZuoHaiMing
Tutor PanYouNeng
School Zhejiang University
Course Information Science
Keywords WordNet semantic similarity XML cluster
CLC TP391.1
Type Master's thesis
Year 2011
Downloads 35
Quotes 0
Download Dissertation

Since it was released at 1998, XML gradually became a standard for data representation and data exchange with the advantage of uncomplexity, self-description, extensibility and open.The XML data is flooding on the web.At present, XML data mining increasingly became a popular research issue.Based on the introduction of XML technology and the cluster algorithm for XML documents, the paper review the study on the XML documents similarity computation, these methods of measuring the similarity of documents at present only make use of comparing the string, and don’t consider the semantic information. In view of these cases, the paper proposes a new method for measuring the similarity, which is based on the semantic tag tree. The method computes the similarity with the structure and semantic information on the basis of path. Firstly, the method makes use of word sense disambiguation which is based on the WordNet to disambiguate the common tags in the documents, then, computes the semantic relatedness of the different tags, measure the document similarity with the same tags and the semantic relatedness of different tags. At last, the paper make the experiment of the documents clustering on the real data sets, which approve that the method is an effective method for XML documents clustering.

Related Dissertations
More Dissertations