Emergent Topic Detection、Tracking and Propagation Prediction Technology Research in Micro-blog
|School||Harbin Engineering University|
|Course||Computer System Architecture|
|Keywords||Micro-blog network Data mining Topic detection Propagation model Propagation prediction|
With rapid development of the Internet, the nework play a more and more important rolein our daily life, how to obtain interested information from the complex network become a bigproblem in the field of data mining. We find that current emergent topic detection technologycan not find the topic formed by new words, resulting in not accurate result; current topictracking technology is not suitable for the micro-blog network consisting of shot texts; currenttopic propagation and prediction is still in the primary stage, can not predict accurently. Basedon this findings, in this paper, on the basis of previous research, we focuse on learning bursttopic detection and topic forecast in the micro-blog network. Give solution to the threeproblems as follows:In view of the flexibility of the micro-blog network, we present the burst topic detectionand tracking based on the key word, in order to detect burst topic formed by new term in thefirst time. Propose a term weight method based on the weight of message, to improve thecharacter weighting accuracy; define “micro-blog number window” instead of “time window”,improve the efficiency and speed of the detection system; proposed a correlation calculationalgorithm based on the combine of similarity and overlapping degree, both ensure thecorrelation calculation accuracy and the calculation speed; proposed a tracking algorithm, tosolve the problem of tracking the drift topic.In depth study of the virus infection model, the message communication model and thetopic communication model, we proposed a topic communication model based on micro-blogfans and user activity, influence. Divide the micro-blog users into three pars: infected users,infecting users and insulating users. The impact of transmission mainly has three factors: thedegree of infection of the infected users, the activity of the infecting users and the bursty ofthe burst topic. Then analyse the relationship between the infected users and the infectingusers, forcast the infecting number in next window time. In addition, we adopt the definition“inside and outside field strength”, and assume that the two has a specific relationship.According to the scale of users, paper gives a topic propagation based on user and a topicpropagation based on scale. The first is more accurate but high time complexity, while thelatter is more suitable of mass data. To sum up, paper mainly study burst topic detection and tracking according to thecontent of text. In view of the existing burst detection argorithm can not find new term,proposed a burst detection method based on word. In the topic propagation prediction,considering the micro-blog transmission line, probability and virus propagation model, givethe topic propagation model based on user and scale. And through the experiment verificationof the proposed algorithm for detecting and predicting, finally paper gives the overallframeword of the system and the prospect of the field.