Dissertation
Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing

Research on Summarization Abstract Algorithm Based on Improved CVSM

Author GuoZhiBing
Tutor HuangGuangJun
School Henan University of Science and Technology
Course Applied Computer Technology
Keywords Summarization Abstract How Net Primitive Co-occurrence Data Conceptual Vector Space Model
CLC TP391.1
Type Master's thesis
Year 2009
Downloads 8
Quotes 0
Download Dissertation

Summarization abstract is a technique of Information-Abstract which is a response to the modern information world. It can extract sentences from a large text quickly and accurately which can express the meaning of the text to generate summarization, and help people to gain useful information efficiency. Firstly, the thesis introduces the research actuality and related technique; and then, according to the disadvantage of the Chinese summarization abstract algorithm which based on the combined of statistic and semantic, proposed a improved abstraction algorithm. The new algorithm improved the previous method from the following two aspects.According to the ambiguity of Chinese words, this thesis put forward an improved algorithm of word sense disambiguation. In this method, we use How Net and corpus builds the primitive co-occurrence frequency database as the basis for word sense disambiguation. When calculate the correlation coefficient of word sense and context, consider the corresponding relationship of the four kinds of primitives have difference ability on semantic expressing, and consider two distance factors which affect the expression of semantic, the one is the space distance between the character-words and the malt vocal word, the other is the space distance between the currently malty vocal word and the same malt vocal word which has been selected sense at the latest.According to the independency between items of the CVSM, this thesis put forward a vague concept equivalence class partitioning algorithm which based on clustering concept. Considering the actuality significance, this algorithm combined the concepts which have no distinct difference in semantics expressing and have great similarity. Use the muster of conceptual as the CVSM’s items instead of the single conceptual, and use the improved CVSM to express the text, then translate the text into data more correct, so that to generate a more concise summary. At the end of this thesis, we have developed the corresponding experimental system for the summarization abstract algorithm which based on improved CVSM to process the experimental verification. Experimental results show that the improved algorithm is better than the previous algorithms, both accuracy and recall rate of the ambiguous word disambiguation are corresponding increase, and so generates a summary in terms of quality also improved.

Related Dissertations
More Dissertations