A Study of Chinese Word Sense Disambiguation Based on Hownet
|School||National University of Defense Science and Technology|
|Keywords||word sense disambiguation How-Net Natural language process Dependency grammar analysis|
Word sense disambiguation (WSD) is all along an important and difficult problem in nature language processing. It is widely used in many natural language processing application systems, such as information retrieval, machine translation, text classification, text summarization and so on. At present, only some representative ambiguous words are selected as disambiguated objects in many WSD researches, which have great limitations in real application. This thesis address this problem in real text application.In this thesis, we firstly introduce the definition of WSD, the history of evolution and the trend of development. Secondly, we analyze the varieties of ambiguity problems in WSD and the corresponding disambiguation algorithms. Finally, based on experience and understanding of WSD, We propose a new method of WSD and analyze several aspects of it, including why to bring forward this strategy, its benefits, and how to take it into application, etc.The method can be summarized as below: After keywords are extracted from the preprocessed text. The ambiguous keywords are disambiguated according to their parts of speech and context. In the disambiguation process, the concepts of keywords are firstly divided into sememes according to their definitions in How-Net. Then, in order to find which words restrict the word sense, the fully dependency grammar analysis is adopted to find dominant and dominated relation among words from inner structure of a sentence. Finally, based on entity relationship of How-Net system and the atomic term of correlative words, the weight of atomic term in ambiguous words is computed, and then the word sense of the ambiguous word can be determined according to the weight.Experiments show that the proposed method has higher accuracy and on average corpuses and requires less calculating time.In a word, although plenty of efforts have been taken in research of this field, the accuracy of WSD still stays in a relative low level because of the fuzzy characteristics of word sense itself. Thus the question how to improve the effect of WSD will be the motivation and objective of our further research in this field.