Research on Coreference Resolution Based on the Maximum Entropy Model |
|
Author | PangNing |
Tutor | YangErHong |
School | Shanxi University |
Course | Applied Computer Technology |
Keywords | Maximum Entropy Model Coreference resolution Corpus Natural Language Processing |
CLC | TP391.1 |
Type | Master's thesis |
Year | 2007 |
Downloads | 132 |
Quotes | 0 |
With the the information explosive growth and discourse processing technology widely used, anaphora resolution shows unprecedented importance and become the hot spot of the research of natural language processing. Coreference resolution refers to the generation of Digestion extremely important sub-tasks, and has great value and social value. Incident news reports, a total that is a common phenomenon, and a large number of chapter or dialogue. The coreference use reported no significant expression of cumbersome, concise and clear. Digestion coreference is a fundamental task of information extraction. Total digestion the one hand, a combination of a variety of natural language processing techniques, such as tagging, noun phrase recognition: On the other hand, is an important part of the technology of natural language processing applications, such as text information extraction problem answering text processing are inevitable coreference resolution. Corpus-based machine-learning refers to the characteristics of the basis of in-depth analysis of the emergency means digestion model. In this paper, the maximum entropy model to explore co-refers to the phenomenon of Chinese emergencies News reported digestion, the purpose is to extract emergency news reports point to the same object nouns, pronouns and noun phrases. The model has the following characteristics: 1 machine self-learning. Marked corpus training maximum entropy model to produce the feature set, replacing the traditional hand-built feature set of practices. 2 easy expansion. May increase or decrease based on the actual use and fields of knowledge, and to facilitate the system migration. 3 has a certain degree of robustness. Due to the natural language processing technology is not perfect, but the feature attribute values ??mainly rely on natural language processing tools, therefore, error is inevitable, and the experimental results show that the strong anti-noise ability of the algorithm. Coreference Chinese emergencies News reported a preliminary study, to learn and to achieve a detailed description of the model based on maximum entropy coreference resolution, and algorithms for comprehensive testing and evaluation. We marked a scale of 20 million words corpus for training and testing, closed experiments F value of 64.6%, the open experiments F value of 59.98%, the experimental results show that the model on in coreference digestion emergencies phenomenon line effective, especially the person pronoun resolution and are aliases for each other and referred to be digested on Digestion better. This paper analyzes the impact of model error types, including part-of-speech tagging errors, noun phrase recognition errors and the characteristic properties Geng value error. In addition, this paper also indicates future research directions, that is, the introduction of the the syntactic features digestion coreference; combined with ACE evaluation model, and to lay the foundation for future research.