Dissertation
Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Retrieval machine

Web-based search engine block indexing systems thinking

Author DengZuo
Tutor HeZuoLian
School Tianjin University
Course Applied Computer Technology
Keywords Page segmentation Indexing system Classification Search Engine
CLC TP391.3
Type Master's thesis
Year 2009
Downloads 54
Quotes 2
Download Dissertation

Existing search engines are indexing the entire page and is used to retrieve , but some pages may contain different themes block , if the user just submitted multiple keywords are located within blocks of different themes , even pages and users retrieval request is not related to the search engine will put the pages back to the user . To improve search engine indexing system , the introduction of page segmentation ideas. This algorithm is chosen as the site VIPS block algorithm , but the classic algorithm VIPS practical application of good control of particle size cut of the problem of too thick for the cutting and slicing of too small two cases , nodes are introduced for this depth threshold threshold number of nodes and leaves , so that algorithms can VIPS characteristics according to the page size of the adaptive segmentation . In the three portals crawling pages as a test set, by improving the contrast with the classical algorithms tests proved that the improved algorithm. On a given page first sub-block and block-based content will be relevant to the subject block into subdocuments and then separately for each sub-document indexing. So that only when the user submits multiple keywords contained entirely within the document in a child , the search engine will return to the original page to the user . Web-based block designed to improve search engine indexing system , developed a number of rules has nothing to do with the body block filter , and the rest of the block classification. Finally, through the development of three groups of seed keyword group, and Google submit an inquiry to get the test set , the collection and index improved retrieval results were compared. Experiments show that this indexing scheme provided a large extent, improve the retrieval accuracy and F1 test value.

Related Dissertations
More Dissertations