Research and Design Topital Crawler for Agricultural Information
|School||Beijing University of Posts and Telecommunications|
|Keywords||Reptile theme Information collection Nutch Chinese word segmentation|
With the development of Internet technology , the rapid growth of network information resources , the number of Internet users are more and more networks are increasing role in people's daily life and work , so people are increasingly concerned about how quickly and effectively from mass network information extracted potentially valuable information to effectively play a role in the work and life , so effective access to the various industries is the basis of the effective use of network information resources professional field topics Web information . For agricultural information the Reptile theme is focused on the massive network information to identify agriculture - related Web information resources , and access and up-to-date system . It can download the picture of crawling the Web coding unified filter crawl agriculture resources identified to meet the needs of content pages . The first intelligent information service platform for agricultural preliminary description focuses on the characteristics of the Reptile theme built on this platform for agriculture . Introduce topic reptiles , reptiles architecture , theory , composition , described in the workflow . Especially for the special requirements of the agricultural business platform resources , reptiles in the collection of information , to do the kind of processing . This article focuses on the development of the Reptile theme for agricultural information . Start with nutch open source search engine , the secondary development , adding primaries module in nutch workflow based on a detailed description of the system development process and methods of achieving the results , with a clear show proved for the reptile theme design and realization method of agricultural information with the feasibility and practicality.