Dissertation
Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing

Research on Cross-document Coreference of Chinese Person Name

Author NiJi
Tutor ZhuQiaoMing;KongFang
School Suzhou University
Course Computer Software and Theory
Keywords Cross-document Coreference Resolution Chinese Person Name Recognition Vector Space Model Hierarchical Cluster
CLC TP391.1
Type Master's thesis
Year 2011
Downloads 45
Quotes 0
Download Dissertation

Cross-document coreference of Chinese person name is a task that distinguishes those same person names in different Chinese articles and links them to real persons, it plays an important role in natural language processing, and is a crucial part in information retrieval, information extraction, multi-document summaries and other application systems. A complete process of Cross-document Coreference can be decomposed into two parts: coreference and entity clustering. Currently most research focused on the latter.On the analysis of the main task and research priorities of the Cross-document Coreference of Chinese person name, this dissertation focuses on following sections: Firstly, this dissertation focuses on the prior step of Cross-document Coreference of Chinese person name recognition. It combines the integrated probability of cohesion, discrimination and the trustworthiness of boundary templates into the trustworthiness of name, and uses its value as threshold to recognize the person name in the text.Secondly, this dissertation computes the similarity of entities with the same name base on the Vector Space Model. Each entity is expressed by items of the document which includes the entity name. For the News Corpus, we classify the collection of entities according to the identity of the person in advance, and then select different kinds of items as features to express different kinds of persons. It is used to increase the accuracy of the transformation from entity to vector.Finally, this dissertation adopts single-link hierarchical clustering method to differ entities with the same name, and then it analyzes the problem of the clustering method, and tries to put forward a solution to improve the effect of the clustering.

Related Dissertations
More Dissertations