Research of Multi Label Text Classification Based on Label Threshold Judgment
|School||Dalian Maritime University|
|Course||Computer Science and Technology|
|Keywords||kNN Algorithm Threshold Judgment Multi-label Text Classi-fication Fuzzy Similarity|
With the growth of the massive amount of data, it is more and more difficult for users to find useful information, and the speed is also being challenged. The research of the automatic text classification technique is more and more important. There are many text categorization methods, such as, vector space model method, association-based classification, the simple vector distance classification, simple Bayesian classification algorithm, Support Vector Machine(SVM), k Nearest Neighbor algorithm(kNN), vocabulary classification, etc. Nowadays, the researches of these algorithms are all to maximize the efficiency of its time efficiency on the basis of improving the classification accuracy.Because of the uncertainty of text classification, there is a state that a text belongs to more than one category. Multi-label classification algorithm gets the attention of many scholars. The current multi-label classification algorithms are mainly to improve text classification accuracy by optimizing distance algorithm, and reduce time complexity by designing more appropriate classification method. Because it requires considerable computing for the algorithm of text classification, on the basis of improving the time efficiency is still a problem. Therefore, how to improve accuracy is one of the key problems. kNN algorithm is simple and easyrealizing, so many scholars do the research of kNN in text categorization. This paper presents the kNN algorithm based on label threshold judgment.The kNN algorithm based on label threshold judgment proposed in this paper use the fuzzy similarity algorithm in FSkNN to do text clustering, narrow the scale of finding kNN, thus achieve the goal of reducing the time efficiency. This algorithm computes threshold for every category through computing the memberships of the texts in training set. This algorithm uses kNN algorithm to find the k nearest neighbors of test text. According to the label vectors of the kNN, we compute the membership degree of every text for every category. For every category, if the membership degree is greater than or equal to the threshold, then the label of text for the category is set to 1, else 0. There is a situation that all the labels of a text are zero, that means the text is lost. This article presents zero label modification algorithm for this situation. If all the labels are zero, then we run this algorithm to modify the result.The experimental results show that the proposed algorithm is more efficient, and get higher accuracy.