Word Segmentation and Pos Tagging in Chinese
|School||University of Electronic Science and Technology|
|Course||Computer Software and Theory|
|Keywords||Natural Chinese processing Segmentation Forward maximum matching Reverse maximum matching Intersection Field Binding degree Chinese Name Recognition Part of Speech Tagging|
Segmentation and POS tagging natural Chinese language processing (NLP) , the previous senior has done a lot of research in this regard , the topic that I have done is to sum ??up this part of the contents of them on the basis of , improve , improve , provide better support for the follow-up study . Segmentation in previous studies mainly uses the MM method ( forward maximum matching ) , combined reverse maximum matching the RMM law ( ) method , and compare their combined degree maximum intersection field to select the segmentation However, this method can only deal with part of the largest intersection field . The subject on the basis of statistics on the the largest intersection field in large real text , the maximum intersection field divided into three categories , and with respect to their treatment greatly improve the handling capacity of the largest intersection field . Chinese Name Recognition is an important element of the segmentation , the issues in large-scale real text characters for the surname, first name , the names of the characters commonly used before and after visits . To name judge using segmentation surname as the trigger point , start Name judgment its recall and precision rates of more than 90% . POS tagging is a difficult natural Chinese language processing . In English , when a word transform part of speech is often accompanied by changes on the word type in Chinese word type on the changes , which increase the difficulty of the Chinese part-of-speech tagging . I addition determines the parts of speech by a conventional method , but also to build a POS determination rule table , every word in the POS determination rule table has a corresponding object , Speech judgment removed from the part of speech in the determination rule table when the corresponding word object for POS judgment . The subject and there is a task that is the previous brothers made ??the subject from the VC ported to JAVA up, in order to be published online .