Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Retrieval machine

Design and Implementation of Crawler System for Public Feelings on Internet

Author LiHaiYan
Tutor YangShuangYuan
School Xiamen University
Course Software Engineering
Keywords Public Feelings on Internet Crawler System Focused Crawler
CLC TP391.3
Type Master's thesis
Year 2014
Downloads 24
Quotes 0
Download Dissertation

Every minute, internet generates huge amount of information, which covers different domain, such as life, technology, and military and so on. Some of information is negative, for a large company or organization; the spreading of a negative message may cause serious consequences. More and more company needs a specially customized internet information gathering and monitoring system. The integrity and timeliness of information gathering is the most important thing here.This thesis mainly describes the design and implementation of crawler system for monitoring public opinion on the Internet. The main works are as following:1. Web page downloading and information is filtering:The system fetches a large number of HTML pages from specified data sources based on keywords directed crawling. The already crawled page will be filtered out.2. Extract key information:extract key information from downloaded html files based on both ontology extraction method and custom extraction method.3. Data updating and storage:Update web page using improved process prediction algorithms and fixed-time crawling. Use shared MongoDB cluster as persistent data storage system.4. Job queue and crawler status monitoring:Use a task queue system to control and manage crawling task status, and use Graphite as real-time monitoring tool.The research and implement of this project meets the needs of companies and organizations that are eager to detect of negative information.

Related Dissertations
More Dissertations