Research on Pattern Recognition Algorithm Based on Feature Evaluation
|School||Harbin Institute of Technology|
|Course||Applied Computer Technology|
|Keywords||Feature weight Feature subset partition Weighted fuzzy c mean Fuzzy nearest neighbor classifier Image recognition|
The Euclidean distance is the commonly used similarity measure in pattern recognition algorithm. It assumes that each feature plays the same role in pattern recognition algorithm, but it is not in practice. When the size of feature dimensionality is higher, the Euclidean distance may be dominated by some irrelevant features. Therefore, the performance of pattern recognition algorithm based on the Euclidean distance will be affected, which is called the curse of dimensionality. It can be lessened by feature selection. When the relevance between feature and class is either highly correlated or completely irrelevant, feature selection can perform best. In this study, feature evaluation is used to deal with the problem with different relevance between feature and class.For the curse of dimensionality in fuzzy c mean, feature weight learning algorithm with respect to index CFuzziness is proposed. Feature weight learning algorithm assigns each feature an importance degree denoting the role in clustering. An appropriate feature weight leads to that the data within one class are more similar and the data in different classes are more separate. In this case, the performance of clustering is better. When index CFuzziness gets its minimum value through the gradient descent technique, the appropriate feature weights are learned. Fuzzy c mean incorporated with feature weight forms the weighted fuzzy c mean. Weighted fuzzy c mean algorithm emphasizes the roles of important features and lessens the roles of irrelevant features. Experimental results show that the weighted fuzzy c mean outperforms fuzzy c mean in clustering.For the curse of dimensionality in nearest neighbor classifier, two multiple classifier systems are proposed based on different feature subset partition methods. Firstly, it decomposes the feature set into several feature subsets. Then each feature subset is classified by one component classifier. Finally, multiple decisions from each component classifiers are combined. Because the size of dimensionality in feature subset is low, the curse of dimensionality is lessened. If there is diversity and accuracy among component classifiers generated by feature subset partition method, multiple classifier system gets a better performance.In this paper, GA and mutual information are used to partition feature subset. According to the multiple classifier system’s accuracy, GA automatically fulfils the feature subset partition by a global search strategy, which belongs to wrapper method. The wrapper method may select the feature subset suitable for each component classifier. Mutual information selects the salient feature subset according to the relevance between feature and class by a forward greedy search strategy, which belongs to filter method. The filter method may be computationally efficient.In this paper, fuzzy nearest neighbor classifier is proposed, which is adopted as the component classifier. Nearest neighbor classifier outputs the class of data. While fuzzy nearest neighbor classifier outputs the membership degree of data belonging to each class.Fuzzy integral is adopted to combine multiple decisions from each component classifier with respect to fuzzy measure. The importance degree for each feature subset is measured by fuzzy measure, where the importance degree is learned by training data. In comparison with other combination method, fuzzy integral not only considers the output of each component classifier but also considers the importance degree for each feature subset. Therefore, it outperforms other combination methods. Experimental results show that both multiple fuzzy nearest neighbor classifier systems based on feature subset by GA and mutual information can get better performance than nearest neighbor classifier in classification.In this paper, three proposed methods are used to recognize Corel image database. Four datasets are retrieved from Corel image database by color histogram, color coherence vector, PWT and Hu moments respectively, which input the image recognition experimental system. Experimental results of image clustering show that weighted fuzzy c mean is superior to fuzzy c mean. Image classification adopts multiple fuzzy nearest neighbor classifier system based on feature subset by GA and multiple fuzzy nearest neighbor classifier system based on feature subset by mutual information. The experimental results show that both multiple fuzzy nearest neighbor classifier system improves the performance of image classification by nearest neighbor classifier. Because GA and mutual information adopt different strategies to partition feature subset, the performance of multiple classifier system depends on the dataset.