Research of Question Answering System Based on the Analysis of Lexical and Semantic Meanings
|School||Harbin Institute of Technology|
|Course||Computer Science and Technology|
|Keywords||Question Answering Question analysis Information retrieval Answer extraction|
With the rapid development of information technologies, accessible data resources expand at an exponential speed in recent years. Ordinary users tend to locate the data and knowledge that they need as soon as possible with kinds of searching tools such as search engines. Questioning Answering (QA) technologies aim to provide better results in a natural way, in which people can ask questions in natural languages, compared to traditional web search engines. QA systems are supposed to be able to return refined results with expected answers to the questions from the users. Also, QA systems can provide supportive materials if the questions are too difficult to answer in a short way.The research work in this dissertation focuses on three main parts of a common QA system: Question Analysis, Paragraph Retrieval and Answer Extraction. In Chapter 2, we proposed a new approach to classify Chinese questions with interrogative words and word senses of issue words. First, by defining the question classification as a Sequence Labeling Problem, we use a CRF model to label the interrogative words and issue words. Second, we try to resolve the word sense ambiguities among these words. The word senses are listed in a Chinese thesaurus, named Tongyici Cilin, which is defined as five-levels. Word sense information of the third level and fifth level from Tongyici Cilin is used in the word sense disambiguation and further the question classification. Third, with the issue words and their part of speeches, we trained SVMs to classify questions. The experiments show that the features of interrogative words and word senses of issues words contribute to the improvement of question classification in QA. In Chapter 3, we did research on the paragraph retrieval, which is one of crucial parts in QA systems. We integrated word sense information to a statistical retrieval model for paragraphs. In Chapter 4, we carefully studied the Answer Extraction and Generation problems. With a new method based on Semantic Role Labeling, we increased the precision of selection of candidate sentences and better performance when a word-of-bag model is considered during the selection. In the final chapter, we describe the details of implementation of our QA system.