Research on Query Expansion Technique of Retrieval System in Biomedical Field
|School||Harbin Institute of Technology|
|Course||Computer Science and Technology|
|Keywords||Genome Track TREC Information Retrieval Query Expansion Relevance Feedback|
Along with the progress of computer technology and biology technique, the biomedical literature is growing by an unprecedented rate. The famous MEDLINE database has collected nearly 11 million biomedical literatures since 1965, growing at the rate of 1500 a day. These documents contain lots of knowledge, so researchers can use different results in the literature to find the relationship between disease and genes, genes and different life functions and the relationship between different genes. If such knowledge applies to practical, human diseases would be diagnosed, prevented and treated better. However it is impossible that such knowledge is obtained from the massive literature. Information retrieval system in view of the massive biomedical literature has become the urgent needs of related researchers. In 2003, TREC Genomics Track came into being.The basis of this paper is TREC Genomics Track 2007. Firstly TREC is breafly introduced, and then data source, themes and form of evaluating submition of TREC Genomics Track 2007 is introduced. Then information retrieval models most currently used are discussed and analyzed, and then the Indri tool kits, which is used in this paper to implement retrieval module of retrieval system in the biomedical field, is also introduced. Concerning that related documents may not be retrieved successfully because of terminology mismatch between those used in retrieving request and those in the set of documents, this paper gives two query expansion methods, which are Regularized Synonym Expansion Method and the Feedback-Based Entity Query Expansion Method. Finally the designing, implementation and testing results of the retrieval system in the biomedical field are described.This paper mainly focuses on the following two aspects, which are information retrieval model and query expansion technology, using which the retrieval system in biomedical field is initially implemented. In order to test the performance of the system and effects of query expanding method, the experiments are designed. The experimental results show that query expansion method positively affects the system. Comparing to baseline system in the Document MAP, Aspect MAP, Passage MAP, the Regularized Synonym Expansion Method increased by 4.5%, 3.4% and 2.3%, and the Feedback-Based Entity Query Expansion Method increased 19.1%, 20.5%, 15.8%, and the value of Document MAP is 0.3445, this result ranks first in all of the groups which participate the system evaluation all over the world.