Dissertation
Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing

For coreference resolution mechanism of the dynamic generalization

Author LiYaoBing
Tutor LiuTing
School Harbin Institute of Technology
Course Computer Science and Technology
Keywords Coreference resolution Example Generalization point Qualified word Structural characteristics
CLC TP391.1
Type Master's thesis
Year 2010
Downloads 38
Quotes 0
Download Dissertation

Coreference resolution is the core of natural language processing tasks, it is for discourse analysis, automatic summarization, information extraction, information retrieval, information filtering and machine translation so has important significance. In this paper, instance-based dynamic generalization mechanisms in English to complete coreference resolution. Instance-based dynamic generalization mechanism is the core idea: to find those from training examples and test examples most similar instances, and according to the most similar training examples of positive and negative predictive test case class distribution of the category labels. Based on this core idea, we propose a generalization of the concept of points and to design mechanisms for dynamic generalization of the two basic algorithms. This paper focuses on two types of dynamic generalization mechanisms: generalization based on the dynamic characteristics of planar mechanisms and characteristics of dynamic generalization based on complex mechanisms. Based on the generalization of the dynamic characteristics of planar mechanisms, this paper focuses on the dynamic generalization mechanism to solve the basic algorithm unresolved optimal generalization point selection criteria and a positive confidence level computational problems. In this paper, five kinds of optimal generalization point selection criteria, and the confidence level of the positive examples is defined as the proportion of patients with piecewise linear function. Experimental results showed that the proposed criteria for selecting optimal generalization point of confidence and the positive cases defined as a base, the dynamic characteristics of generalization based on planar mechanisms in English corpus to achieve results with three traditional machine learning method works quite . Complex feature values ??are character sequences containing type, structure type characteristics. This article is divided two sub-tasks on the dynamic characteristics of generalization based on complex mechanisms were studied: (1) the dynamic characteristics of center-based language generalization mechanism. This paper introduces the antecedent and anaphor as the new center of language features, it belongs to the character sequence type. Basic algorithm for dynamic generalization error mechanism analysis results, we propose to capture the naming competition model named entity recognition errors and language mutually exclusive match. Experimental results show that the use of competitive mode, based on the central characteristic of the dynamic language generalization mechanisms in the English corpus to achieve significant enhancement effect, but in the Chinese corpus needs to be further improved. (2) based on structural features dynamic generalization mechanism. This paper introduces the Simple-Expansion tree as a new feature, it belongs to the structure type. This paper presents two tree pruning strategy to solve structural generalization point matching problem and again through the competition model will feature integration into the dynamic tree structure generalization mechanisms. Experimental results show that the use of competitive mode, based on structural features dynamic generalization mechanisms in the English corpus effect is not ideal for the development and utilization of structural features still need further improvement.

Related Dissertations
More Dissertations