Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing

The Improved Extraction Word Model and Its Implementation Based on Word Boundary Characteristics

Author LiuYang
Tutor OuYangLiuBo; GanZhi
School Hunan University
Course Software Engineering
Keywords Chinese word extraction the word extraction model word boundary heuristic algorithm boundary filtering algorithm
CLC TP391.1
Type Master's thesis
Year 2013
Downloads 4
Quotes 0
Download Dissertation

Chinese word extraction is the Chinese information processing in one of the mostbasic tasks. Currently, the traditional Chinese word extraction technique is mainlybased on statistical word extraction method, and achieved good results, but there isstill room for improvement. Based on this situation, this paper proposes a modifiedword extraction model.Firstly, through summarizing the traditional word extraction algorithm, designthe basic word extraction model, a basic word extraction model word extractionfeature selection strategy, word evaluation strategies, strategy choice of words,concepts such as filtering algorithm standard processes and module function. Andthrough word extraction of the basic principle of the model has been improved on theanalysis of the basic word extraction model a few key points.In the basic word extraction model based on the introduction of thecorresponding evaluation criteria, specific design characteristics of the improvedchoice of words chosen strategy, design the appropriate choice of words strategy andimplementation, improved filtering algorithm design, as well as for infrequentvocabulary heuristic algorithms. Based on the theory put forward based on m ulti-stepiteration pumped through the concept of the word, for the concept, design thecorresponding word extraction based on word boundaries improved set of featurespumping concrete realization of the word model.Experimental section to Bake-off2005based on data provided by training, firstanalyzed using the word boundary characteristic features as word extractionfeasibility, followed by the choice of words used algorithm to generate a reasonableset of candidate words, and through word filtering algo rithm for the candidatescreening, the final excavation corpus based heuristic more potential words. Finallyexperimental results show that the method accuracy, recall and F-measure, etc. havebeen improved to some extent.

Related Dissertations
More Dissertations