Dissertation
Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing

Based on spectral clustering method coreference resolution

Author XieYongKang
Tutor WuLiDe
School Fudan University
Course Applied Computer Technology
Keywords Coreference Resolution Spectral Clustering Maximum Entropy
CLC TP391.1
Type Master's thesis
Year 2009
Downloads 133
Quotes 0
Download Dissertation

Reference resolution is one of the basic questions in Natural Language Processing. It is very important in tasks such as Named Entity Detection and Tracking,Relation Extraction and Question and Answering.Coreference relation is a kind of equivalence relation in reference relation.This paper proposed an algorithm based on spectral clustering to solve the Chinese Coreference Resolution task.Two steps have been adopted in our spectral clustering based approach to implement coreference resolution.First,a maximum entropy classifier is used to predicate if two mentions should belong to one entity.The link probability given by the classifier will be used as the coreferencial probability of the two mentions.The second step is applying spectral clustering algorithms based on the corresponding matrix relating to the Laplacian matrix which is computed by the probabilities of mention pairs.Finally,mention pairs will be merged or split by clustering algorithms to become entities.The dataset we used for experiments is the ACE 2007 Chinese corpus.We use ACE evaluation tools to get the ACE value and B cubed value.We also compare spectral clustering algorithm with other commonly used clustering algorithm such as transit closure,closest link,best link and bell tree algorithms to analyze their characteristics.The results show that spectral clustering algorithm is effective with suitable parameters on entity subtype mention pairs.It can generate entities from a global view and with ACE value and B cubed F value equal to 75.5%and 82.0%respectively. These scores are better than the best performance of the clustering algorithms mentioned above by 0.6%and 3.5%of ACE value and B cubed F value respectively. However,since spectral clustering algorithm is sensitive to mention type and threshold,other algorithms have better performance than it on Notype experiments. We also give some brief discussion on the disadvantages of spectral clustering in coreference resolution in the paper.Threshold auto-generating methods are also discussed in the end of the paper. These methods might be helpful to the future work in this field.

Related Dissertations
More Dissertations