Based on spectral clustering method coreference resolution |
|
Author | XieYongKang |
Tutor | WuLiDe |
School | Fudan University |
Course | Applied Computer Technology |
Keywords | Coreference Resolution Spectral Clustering Maximum Entropy |
CLC | TP391.1 |
Type | Master's thesis |
Year | 2009 |
Downloads | 133 |
Quotes | 0 |
Reference resolution is one of the basic questions in Natural Language Processing. It is very important in tasks such as Named Entity Detection and Tracking,Relation Extraction and Question and Answering.Coreference relation is a kind of equivalence relation in reference relation.This paper proposed an algorithm based on spectral clustering to solve the Chinese Coreference Resolution task.Two steps have been adopted in our spectral clustering based approach to implement coreference resolution.First,a maximum entropy classifier is used to predicate if two mentions should belong to one entity.The link probability given by the classifier will be used as the coreferencial probability of the two mentions.The second step is applying spectral clustering algorithms based on the corresponding matrix relating to the Laplacian matrix which is computed by the probabilities of mention pairs.Finally,mention pairs will be merged or split by clustering algorithms to become entities.The dataset we used for experiments is the ACE 2007 Chinese corpus.We use ACE evaluation tools to get the ACE value and B cubed value.We also compare spectral clustering algorithm with other commonly used clustering algorithm such as transit closure,closest link,best link and bell tree algorithms to analyze their characteristics.The results show that spectral clustering algorithm is effective with suitable parameters on entity subtype mention pairs.It can generate entities from a global view and with ACE value and B cubed F value equal to 75.5%and 82.0%respectively. These scores are better than the best performance of the clustering algorithms mentioned above by 0.6%and 3.5%of ACE value and B cubed F value respectively. However,since spectral clustering algorithm is sensitive to mention type and threshold,other algorithms have better performance than it on Notype experiments. We also give some brief discussion on the disadvantages of spectral clustering in coreference resolution in the paper.Threshold auto-generating methods are also discussed in the end of the paper. These methods might be helpful to the future work in this field.