Research and Application of Diverse Density Learning Algorithm
|School||Harbin Institute of Technology|
|Course||Computer Science and Technology|
|Keywords||Multiple-Instance learning multiple-concept DD algorithm overlapped instances classification image retrieval microRNA precursors|
Multiple-Instance learning is the forth machine learning framework after supervised learning, unsupervised learning and reinforce learning, which has been used in medicine design, image retrieval and other research fields, and expected results is available. In multiple-instance learning, training samples are bags which are composed of multiple instances, and the bags are labeled but instances are not.The purpose of learning is to predict the labels of new bags.Diverse Density (DD) algorithm is a typical multiple-instance learning algorithm, which can learn a more closer objective function, but the algorithm still has two shortcomings. First, DD algorithm can learn just one objective function, whose learning ablity must be improved. Second, if a bag includes at least one positive instance, the bag is positive, otherwise is negative, so that predict new bags without considering sparse positive instances character of samples. So it always mistakes positive bags for negative bags when classify overlapped instances.Firstly, the paper proposes a multiple-concept DD algorithm to improve the shortcomings of one objective function learning ablity of DD algorithm. The multiple-concept DD algorithm comprehensive describe the objective concept by learn multiple objective functions.Secondly, because of sparse positive instances character, when classify the bags which include multiple overlapped instances, some negative bags are considered as positive bags. To solve this problem, the paper proposes a classification algorithm based on overlapped instances, which modify the influence strategy of the instances to the bags when classify bags.The paper also uses the improvements to retrieve images and classify the real and pseudo microRNA precursors in bioinformatics. In image retrieval, the users’interest objects are available by multiple-concept DD algorithm, and the new classification algorithm eliminates the noise with positive instance character. Furthermore, the result is better than DD algorithm. In addition, the new classification method is used to classify the real and pseudo microRNA precursors in bioinformatics, and has attained an excellent result.