Research and Implementation of Vertical Search Engine
|School||Hebei University of Science and Technology|
|Keywords||vertical search engine system crawling Web spider extracting on thestructure of information creating Index sorting mechanism|
With the rapid development of Internet technology, computer technology and thepopularization of personal computer, the channels by which people get information aregradually becoming extensive. Among so many sources of getting information, theinfluence of the Internet is getting larger and larger, so acquiring information on theInternet becomes one of the main ways for people to access information. In the case ofWeb information to the growth in geometric progression, the service provided bytraditional search engines has failed to meet the needs of users. More and more users havegreater demand to the intelligence, humanization of search engine system. People hopethat the search results will be more accurate, more in line with their own needs. These newdemand has put forward higher requirements to the search engine technology. Therefore,the vertical search engine technology is born out against this background.This paper studies the application of vertical search engine technology. First itanalyzes the characteristics and working principle of vertical search engine. Then, thispaper focuses on the internal structure and operating mechanism of Heritrix, which is atool that can crawl web spider, and then uses Heritrix to crawl web content. At the sametime, it extracts on the structure of the crawled information, and stores related web content.Also, this thesis studies the internal structure and operating mechanism of Lucene, then ituses Lucene to establish the index system. It studies the sorting mechanism of Lucene, andoptimizes the sorting results. It completes a whole vertical search engine system, andanalyzes the results. For designing vertical search engine system of corporate website, thispaper provides a practical significance of reference.