Dissertation
Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing

Words’ Semantic Orientation Analysis Based Text Orientation Identification

Author WuYun
Tutor ZhongYiXin
School Beijing University of Posts and Telecommunications
Course Signal and Information Processing
Keywords Comprehensive Information Theory Natural Language Processing (NLP) orientation identification semantic orientation k-means clustering
CLC TP391.1
Type Master's thesis
Year 2008
Downloads 231
Quotes 2
Download Dissertation

Semantic Orientation is means the attitude to the subject expressed in the text. It is belongs to the conception of pragmatic information in the linguistics domain. How to make the computer identify the intention of text’s author automatically is a very important task in computational linguistics. A high performance automatically identification system is quite valuable for practical applications like internet opinions and information monitors.For this, there are two main researches in this paper.First, to prove the value and importance of the Comprehensive Information Theory in NLP task, we design an automatic identification of texts’ orientation system based on the Comprehensive Information Theory. The system label and extract the information from the text by three aspects: syntactic, semantic and pragmatic. Finally, we add this labeled information to a SVM text classifier one by one to calculate. It is proved that, the system performance achieve the highest level when all the three kind of information (syntactic, semantic and pragmatic information) are used.Second, the calculation of words’ semantic orientation is researched deeply, which is used as the pragmatic information processing in the comprehensive processing of texts. In the experiments, we think that the calculation result of words’ semantic orientation needs more effective explanation and certification. Thereby, we import the clustering algorithm to get further study and analyses on the word’s semantic orientation. Then we reuse the clustering result to our original identification system. When the difference among the word classes that is gotten through the clustering algorithm is reused to the original system, the performance of it is improved. In this way, we give an effective explanation to the calculation of words’ semantic orientation. On the other hand, it provides another way to improve the system.At the end of the paper, it is the conclusion of our work and the expectation of the further research in the future.

Related Dissertations
More Dissertations