Research on Spam Filtering Based on Social Network
|School||Harbin University of Science and Technology|
|Course||Applied Computer Technology|
|Keywords||Spam filtering feature extraction machine learning online activelearning|
With the development of Web2.0technology, online social networks havebecome the most popular platform; the millions of users connect andcommunicate on the channel. The users share, exchange and interaction on socialnetworks, however, a mass of spam greatly expands at the same time. It is verynecessary to purify the networks and create a healthful social ecological system.Therefore, spam filtering technology for social network has become a hotspot incurrent research.Machine learning techniques have been widely used in spam filtering onsocial platforms, which have achieved strong accuracy, low cost, and highautomation. This paper fights the spam on the Sina Weibos, and its content is mainlydivided into the following parts:At first, we analyze the communication principle of spam on social networks;a spam filtering framework based on machine learning has proposed to detectsuspicious accounts. We implement the logistic regression, support vectormachine, random forest models for spam filtering.Then, we collect multiple features from weibo accounts for detecting spam.The features are extracted from user behavior and content behavior. We draw themessage flow by the social graph. Relation features have be described forcomputing the users’ intimacy. The data analysis and experiment help us evaluatethe system performance.Furthermore, taking into account system’s actual demand, online activelearning is proposed for spam filtering. Active learning methods canconspicuously reduce labeling cost by identifying informative examples andspeed up the filter.At last, the spammers often hijack the normal users for economic interestsfor forwarding spam. In this paper, we develop an effective spam zombiedetection system by monitoring weibo accounts. It is designed based on apowerful statistical tool called Sequential Probability Ratio Test to detect the compromised accounts.