Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Translator

Repeat the question based on extended research

Author KangWeiPeng
Tutor ZhangYu
School Harbin Institute of Technology
Course Applied Computer Technology
Keywords Query Expansion Search Keywords Repeat phrases Language model
CLC TP391.2
Type Master's thesis
Year 2011
Downloads 35
Quotes 0
Download Dissertation

Problems extend that, according to user's query intent, on the basis of the original query by adding more conducive to improving search results word, phrase or phrases, or by rewriting the reconstruction, making search results more satisfied with the user's query intent. The reason is that the problem extended query words do not match the index word problem, commonly known as keyword does not match the problem, its roots lie in the flexibility and complexity of natural language diversity. Problems include expansion technology research, extension and expansion of resources to build algorithms to explore these two aspects. Based on the problems repeat expansion technology research, these two aspects of the problem will extend to explore research, try to solve the semantic level keyword mismatch. On the one hand, this article describes the use of online dictionaries for repeat phrases automatically constructed expansion of resources; the other hand, this article will explore the resources applied to the problem extended repeat the phrase new method is proposed based language model checking problem of expansion of the three algorithms. Translator and dictionary use of multi-system approach taken to repeat the phrase, phrase extraction will repeat as statistical machine translation process. Translator and dictionary through a multi-system, the source language phrase translated into an intermediate language, and then translated back to the intermediate language source language phrase, established through an intermediate language between source language phrase translation model. This method has the advantages of simple feasibility, and repeat the phrase acquired close to 70% accuracy rate, the average number of repeating phrases reached 6. For questions, this paper identified major research keywords and empowerment. This paper uses a combination of rules and statistical method for determining keywords and keyword empowerment based on statistical methods. Experiments show that the method used to determine the keywords with respect to methods and rules of empowerment method, the accuracy was increased by 3 percent. This paper presents three problems using the phrase repeat expansion method is based on language models were tested N-Best synonymous questions expansion algorithm, based on the language model checking N-Best synonymous phrase expansion algorithm, as well as language-based model checking The N-Best synonymous phrases improved expansion algorithm. This article will explain the principles of the three expansion algorithm and through experiments comparing the performance of each method. Set in TREC9 evaluation experiments show that: compared to the original query, repeat the phrase is used to issue the recall expanded by nearly 3 percent, based on the language in which the N-Best Model Checking synonymous questions extension methods best performance.

Related Dissertations
More Dissertations