Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing

Word Segmentation and Pos Tagging in Chinese

Author LiuDongXu
Tutor YangGuoWei
School University of Electronic Science and Technology
Course Computer Software and Theory
Keywords Natural Chinese processing Segmentation Forward maximum matching Reverse maximum matching Intersection Field Binding degree Chinese Name Recognition Part of Speech Tagging
CLC TP391.1
Type Master's thesis
Year 2003
Downloads 377
Quotes 10
Download Dissertation

Segmentation and POS tagging natural Chinese language processing (NLP) , the previous senior has done a lot of research in this regard , the topic that I have done is to sum ??up this part of the contents of them on the basis of , improve , improve , provide better support for the follow-up study . Segmentation in previous studies mainly uses the MM method ( forward maximum matching ) , combined reverse maximum matching the RMM law ( ) method , and compare their combined degree maximum intersection field to select the segmentation However, this method can only deal with part of the largest intersection field . The subject on the basis of statistics on the the largest intersection field in large real text , the maximum intersection field divided into three categories , and with respect to their treatment greatly improve the handling capacity of the largest intersection field . Chinese Name Recognition is an important element of the segmentation , the issues in large-scale real text characters for the surname, first name , the names of the characters commonly used before and after visits . To name judge using segmentation surname as the trigger point , start Name judgment its recall and precision rates of more than 90% . POS tagging is a difficult natural Chinese language processing . In English , when a word transform part of speech is often accompanied by changes on the word type in Chinese word type on the changes , which increase the difficulty of the Chinese part-of-speech tagging . I addition determines the parts of speech by a conventional method , but also to build a POS determination rule table , every word in the POS determination rule table has a corresponding object , Speech judgment removed from the part of speech in the determination rule table when the corresponding word object for POS judgment . The subject and there is a task that is the previous brothers made ??the subject from the VC ported to JAVA up, in order to be published online .

Related Dissertations
More Dissertations