Based on Maximum Entropy Model Research on extracting Chinese entity relation |
|
Author | ZhangYiHao |
Tutor | GuoJianYi |
School | Kunming University of Science and Technology |
Course | Applied Computer Technology |
Keywords | Named Entity Coreference resolution Entity relation extraction Information Extraction Maximum Entropy Model |
CLC | TP391.1 |
Type | Master's thesis |
Year | 2010 |
Downloads | 93 |
Quotes | 0 |
Entity relation extraction from a particular area is found in the text of the various entities semantic relationships between pairs and stored in a structured form. It is in information retrieval, automatic answering system has a wide range of applications, and information extraction as a key technology in the field more and more attention. And entities are mainly used to describe the nature of an object or collection of different entities is to explore the relationship between entities explicit or implicit semantic links. Entity relation extraction system performance depends on several aspects, including entities properly detected, determine the correct entity types and relationship types between entities correct judgment. Usually a more complete relation extraction system should include five modules connected in turn: NLP treatment, named entity recognition, pattern matching or classification, coreference resolution, as well as processing and standardization of new relationships output. In order to achieve a more complete relationship extraction system, this paper proposes the use of maximum entropy method to achieve entity relation extraction, and in turn connected to the system is divided into three modules: named entity recognition, coreference resolution, entity relation extraction. Their achievements and contributions is mainly reflected in the following aspects: 1) named entity recognition: As a former relation extraction continued to work the system entity recognition is an important part. In this paper CRFs machine learning algorithms, considering the physical size of the window after a certain word, part of speech and other characteristics, to achieve the people, organizations, GEP, location, transportation, facilities, entities identified seven categories of weapons, made a better results. 2) coreference resolution: As a named entity in the text may appear in the same sentence several times, its manifestations may be varied, so the relationship between the entities will often be repeated probing. Exist in relation extraction for the above problem, a method of extraction through regular feature vector, and use SVM classifier machine learning algorithm to train the model method refers to the relationship between the entities of digestion. 3) Based on Maximum Entropy Model Entity Relation Extraction: This part is the main work and research. Considering this word, part of speech, physical, and the corresponding relation extraction combined feature set of features to build and build in features used in the process stop word removal techniques and coreference resolution techniques are used to re-processing of named entities, avoid repeated probing relationships between entities. In using the maximum entropy model to achieve automatic extraction of the entity relationship problems, experiments show that the maximum entropy algorithm with respect to other supervised machine learning algorithm to improve the final result is not; verified on the basis of physical characteristics of words and parts of speech, stop words, and the classification results combined feature is extremely useful features, and ultimately achieved good results. 4) DEMO: This system integrates a named entity recognition, coreference resolution, entity relation extraction three sequentially connected module enables the automatic extraction of entities and their relationships, the last three sets of experiments were designed for them to be tested.