The Multi-label Learning Algorithm Study Based on Data Conversion and Co-training Technology
|School||Shandong Normal University|
|Course||Computer Software and Theory|
|Keywords||Multi-label learning weighing K nearest neighbors multi-instance learning semi-supervised learning|
Multi-label problem exists widely in many real-world applications, and current is theresearch focus of machine learning and data mining fields. Multi-label study provides aneffective solution to the complex issues raised by ambiguous objects, a large number ofmulti-label learning algorithms already exist, and have been widely applied in text categorization,bioinformatics, scene classification, automatic dimensioning video and many other fields.However, the existing multi-label learning approaches focus on the traditional supervisedlearning framework. Specific approach can be broadly divided into three categories. The firstcategory converts multi-label learning problems into a two-class classification problem, whereeach tag corresponds to a two-class classification problem, this method works good at the lessclass mark and sample rich, but compared to the more tags, the samples will meet the problem ofsparse, and due to ignore the relationship between tag information, often poor performance. Thesecond category converts the multi-label learning problem into marking scheduling problem, thisapproach focuses on the correctness of sequence tags, but it requires additional learning athreshold function to get the final set of related marks, while learning the threshold functionitself is a difficult problem. The third category is to combine the multi-label learning problemwith the structure information between tags, when this method in the structure information isused properly, excellent performance can be obtained. But in the absence of domain knowledgeinstruction, it is almost impossible to know how structure information should be used for good.Based on the above method, this paper puts improving the classification accuracy as thestarting point, through defining and extracting the multi-label sample collection, through raisingthe multi-label learning algorithm based on the weighted neighbors and multi-instance, throughcombining multi-label with semi-supervised learning algorithms, conducts for the furtherresearch of multi-label learning algorithm’s accuracy.The main research work and the proposed innovations in this paper are summarized asfollows:1、The study of problems transformed in multi-label learning. At present the multi-labelalgorithm exists the thinking about converting multi-label problem into multi-labelmulti-instance problem, which improves the classification performance of multi-label study in acertain extent, but can still be further improved in terms of classification accuracy, timecomplexity. Introducing KNN and weighted method, for each possible class, determines the kneighbors of a certain sample belong to this class, to weighted and average neighbors for averagevector, convert the sample into the form of package, to maintain the data of local distribution characteristics and improve the classification accuracy.2、The multi-label learning study based on numerous unlabeled data. In the real worldproblems, it is often easier to get a lot of unlabeled data, and each object may have a plurality ofmarkers, which significantly increases the difficulty to access the data marked. Therefore, themulti-label learning study based on numerous unlabeled data, semi-supervised learning can beused to improve the performance of multi-label classification. Apply the co-training thinking tomulti-label study, selects the local KNN and global KNN for training to get two classifiers, theclassifiers label the unlabeled examples and add to the training set. Collaborative trainingprocess iterates continuously, until finishes training. Considering the training set from local andglobal perspectives, improves the classification accuracy.