Words’ Semantic Orientation Analysis Based Text Orientation Identification
|School||Beijing University of Posts and Telecommunications|
|Course||Signal and Information Processing|
|Keywords||Comprehensive Information Theory Natural Language Processing (NLP) orientation identification semantic orientation k-means clustering|
Semantic Orientation is means the attitude to the subject expressed in the text. It is belongs to the conception of pragmatic information in the linguistics domain. How to make the computer identify the intention of text’s author automatically is a very important task in computational linguistics. A high performance automatically identification system is quite valuable for practical applications like internet opinions and information monitors.For this, there are two main researches in this paper.First, to prove the value and importance of the Comprehensive Information Theory in NLP task, we design an automatic identification of texts’ orientation system based on the Comprehensive Information Theory. The system label and extract the information from the text by three aspects: syntactic, semantic and pragmatic. Finally, we add this labeled information to a SVM text classifier one by one to calculate. It is proved that, the system performance achieve the highest level when all the three kind of information (syntactic, semantic and pragmatic information) are used.Second, the calculation of words’ semantic orientation is researched deeply, which is used as the pragmatic information processing in the comprehensive processing of texts. In the experiments, we think that the calculation result of words’ semantic orientation needs more effective explanation and certification. Thereby, we import the clustering algorithm to get further study and analyses on the word’s semantic orientation. Then we reuse the clustering result to our original identification system. When the difference among the word classes that is gotten through the clustering algorithm is reused to the original system, the performance of it is improved. In this way, we give an effective explanation to the calculation of words’ semantic orientation. On the other hand, it provides another way to improve the system.At the end of the paper, it is the conclusion of our work and the expectation of the further research in the future.