LUCENE2.0 source code search engine architecture - based implementation
|School||Northwestern Polytechnical University|
|Keywords||Search engine Lucene Chinese word segmentation Thread Liferay Portal|
Flourish in the Internet today , the information on the Internet is voluminous . People enjoy the convenience of the Internet at the same time , is also facing a problem , how huge information accurately and quickly find the information they need , which the Internet search engine came into being . Web search engine technology is becoming the computer science community and information industry competing research and development of object . A search engine is a web site on the Internet dedicated to provide inquiry services , these sites through the Web search software or website landing page to collect a large number of sites on the Internet , building a database after processing , enabling the user each kinds of inquiries , respond to , and provide the information required by the user . In this paper, the open source Lucene engine architecture designed and implemented a reusable , scalable search engine system Hicode, can be used to specifically search the web and local data in a programming language source code files , effectively positioning the user needs the location of a certain period of the program source code and its source file . This paper is the first to use open source Lucene search engine system Hicode tools . Then use the Java technology that reptiles , index and search the realization of three core part of the search engine . Reptilian part of Java multi - thread mechanism , using a thread pool to manage multiple crawl threads , concurrent crawl the web . Index and search part Lucene engine architecture to achieve a more effective Chinese Chinese word Lucene custom word , and also the introduction of serialization and JavaCC to improve the index of efficiency and development efficiency . Finally, a source code search engine integrated into the Liferay portal provides a user interaction interface .