Research on Cross-document Coreference of Chinese Person Name |
|
Author | NiJi |
Tutor | ZhuQiaoMing;KongFang |
School | Suzhou University |
Course | Computer Software and Theory |
Keywords | Cross-document Coreference Resolution Chinese Person Name Recognition Vector Space Model Hierarchical Cluster |
CLC | TP391.1 |
Type | Master's thesis |
Year | 2011 |
Downloads | 45 |
Quotes | 0 |
Cross-document coreference of Chinese person name is a task that distinguishes those same person names in different Chinese articles and links them to real persons, it plays an important role in natural language processing, and is a crucial part in information retrieval, information extraction, multi-document summaries and other application systems. A complete process of Cross-document Coreference can be decomposed into two parts: coreference and entity clustering. Currently most research focused on the latter.On the analysis of the main task and research priorities of the Cross-document Coreference of Chinese person name, this dissertation focuses on following sections: Firstly, this dissertation focuses on the prior step of Cross-document Coreference of Chinese person name recognition. It combines the integrated probability of cohesion, discrimination and the trustworthiness of boundary templates into the trustworthiness of name, and uses its value as threshold to recognize the person name in the text.Secondly, this dissertation computes the similarity of entities with the same name base on the Vector Space Model. Each entity is expressed by items of the document which includes the entity name. For the News Corpus, we classify the collection of entities according to the identity of the person in advance, and then select different kinds of items as features to express different kinds of persons. It is used to increase the accuracy of the transformation from entity to vector.Finally, this dissertation adopts single-link hierarchical clustering method to differ entities with the same name, and then it analyzes the problem of the clustering method, and tries to put forward a solution to improve the effect of the clustering.