Design and Lmplementation of the Spam Filtering System Based on VSTO |
Author | ZhangYongHua |
Tutor | JiangJianGuo; LiWenBin |
School | Xi'an University of Electronic Science and Technology |
Course | Computer technology |
Keywords | Spam Native Bayes Classifier Outlook Filter plug-in |
CLC | TP393.098 |
Type | Master's thesis |
Year | 2012 |
Downloads | 42 |
Quotes | 0 |
Spam filtering is an important and urgent issue in the application of the internet which isacquiring anincreasingly attention. It is never new things for the users who use e-mailfrequently. Generally speaking, spams mean that the same e-mails are sent to many differentusers at the same time by a sender for the purposes of commercial advertisement or politicalpropaganda. If these kinds of e-mails were too much, they would not only bother us but alsoaffect the normal use of the e-mails. Spam flitering is actually a text classification problem,and the naive Bayesian classifier is a simple and effective classification method. The biggestshortcomings of this method are that it assumes all of the attributes are independent, and thisassumption is usually unable to meet in praticle. However, if the assumption of conditionindependence were not made, it would inevitably lead to combinatorial explosion. So,improved spam filtering algorithm based on Bayesian has aroused more and more researchers’attention.This paper firstly studies the spam filtering method and other corresponding filteringalgorithms, and then compares the advantages and disadvantages of these typical algorithms.Meanwhile, this paprer studies the protocols about how to send and receive e-mails, andanalysises the current reseach status of the spam flitering technology. Moreover, according tothe working principle of the e-mail system, spam flitering technology based on Bayesiannetwork is analysised emphatically. Besides, the characteristics and the classification accuracyof naive Bayesian method is analysised with the corresponding instance. For the problems ofe-mail filtering software is absent, a client e-mail filtering system is proposed. At last, anautomatic filtration system based on VSTO and outlook is realised to solve this problem. Thissystem conbines variety means including the manual rules, blacklists, whitelists, automaticrules, single machine learning filters, integrated filters and other filter learning methods. Weuse the new e-mails in the client computer to experiment, which have been classified as spamsand legitimate e-mails as a test souce previously, and the corresponding characteristic patternis got. Finally, this characteristic is studied by the classification method and the purpose offiltering is achieved.This system possess the traits of fully function, very good filtering effect with a95%ofprecision rate,2%of false reject rate and10%of flase acceptance rate, which has rather highpromotional value. Meanwhile, this system can also use as the filter plug-in of the outlook,which can automatic filter the e-mails in the inbox of outlook. The accuracy and the recall rate of the anti-spam is always an important direction inspam filtering system, and we would continuously increase the research efforts in this field infuture work.