Dissertation
Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing

Research on Technologies of Chinese Coreference Resolution Based on Domain Ontology

Author ShiShuMin
Tutor HuangHeYan
School Nanjing University of Technology and Engineering
Course Applied Computer Technology
Keywords Coreference Resolution Named Entity Recognition Domain ontology Ontology instance of the part of speech template Semantic class in the field of characteristics Machine learning Zero -shaped refers to
CLC TP391.1
Type PhD thesis
Year 2008
Downloads 294
Quotes 6
Download Dissertation

Coreference is a ubiquitous natural language phenomenon in discourse and dialog. It makes a topic more prominent and an expression more concise and coherent, and yet it brings ambiguity in Natural Language Processing. Coreference Resolution is the process that eliminates the indeterminacy resulted from this coreferential formal. Along with the more and more numerous requirements focusing on real discourse processing lately, CR shows the unprecedented importance, and has currently become a popular research area in NLP.In this dissertation, the author makes a study of Chinese Coreference Resolution and Named Entity Recognition, aiming to explore the concrete modes and effects provided by domain ontology, and finally validate conclusion through the machine learning methods. This dissertation emphasizes particularly on methodology research associated with empirical analysis study, proposes new methods based on domain ontology, and obtains the following achievements:Firstly, a Two-Phase Step Up method is proposed to construct domain ontology. The TPSU divides the process of ontology construction into two phases and six levels. During the processing, a Triple Model Rule is proposed to effectively solve the transition of single tree-like structure to multielement net-like structure. And last the TPSU enriches and consummates the knowledge framework of domain ontology through instances establishment. The TPSU and TMR are intuitionistic and operable which can be applied to construction of any other domain ontology with the similar speciality.Secondly, a Mobile Phone Ontology is constructed which consists of 12 core concepts, 78 cascading attributes, 13 relations around these concepts and attributes, and 4,239 instances. To our knowledge, there not exist such domain ontology libraries so far. Almost all core concepts in MPO can be reuse directly, the extendibility and practicality of which embodies the characteristic of ontology knowledge sharability.Then, named entity is classified into general named entity and domain named entity. In Domain Named Entity Recognition, in order to review the support mode provided by domain ontology, a kind of algorithm acquiring POS template based on MPO is proposed. Combined with CRFs machine learning model, it can make improvement on performance of DNER with F-measure up to 92.36%. Contrast experiments show that these templates are of high stability and can improve the rate of precision, especially which have great effected in recognition of the boundary and special form of DNE.Next, in Chinese CR, aiming at exploring the effect of domain ontology, a new method of acquiring semantic class feature based on MPO is proposed. Through feature being annotated automatically, noun phrases that are treated as antecedence candidates get corresponding semantic class features. Combined with Decision Tree machine learning model, this method can make improvement on performance of CR of DNE with F-measure up to 86.49%, which is 7.36% higher than that without such a feature.Finally, in the view of the situation that the current study of Chinese zero-pronounces coreference is mainly in the area of linguistics and psychics, a kind of model included three operable algorithms is proposed. Footed on above achievements, this model realizes the Zero Coreference Resolution by identifying the antecedences of zero-pronounces in coreference segments, and filling their antecedences in ellipsis location.

Related Dissertations
More Dissertations