Interval Number-Based Uncertain Data Mining and Its Applications
|Course||Control Science and Engineering|
|Keywords||Interval number Return Classification SVM Nuclear methods Multi-scale learning Genetic Algorithms Fuzzy clustering algorithm Data Mining|
Data mining approach has been successfully applied into various fields, but in fact many data are uncertain and imprecise in natural and social sciences because of limits of measurement technology and uncertainties of objects et al. The inappropriate data mining methods based on these uncertain data will result in poor, even unaccepted quality of mining models. How to extract hidden knowledge in uncertain and imprecise data is worth researching. According to the uncertain theory on which data mining methods are based, four major approaches have proposed for data mining under uncertainty: data mining for stochastic data, data mining for grey data, data mining for fuzzy number and data mining for interval number.To overcome the shortcomings of data mining approaches for uncertain, imprecise and numerous production data, the thesis takes national 863 projects, i.e. steel& iron production quality control for instance, and proposes SVM and kernel method-based data mining models for interval number, and applies the proposed algorithms to quality analysis and prediction.The main research work is depicted as the follows:1. Two regression analysis models for interval number are proposed. According to practical problems in steel& iron production, two regression analysis models for interval number are presented. (1) A SVM based regression model with interval number input and interval number output. The SVM regression method is generalized from real number domain to interval number domain, and simultaneously keeps the merits of SVM; (2) A SVM based regression model with accurate number input and interval number output. The method takes advantage of the relationship between upper and low limitation regression models simultaneously. The proposed algorithm can prevent lower limitation from exceeding upper limitation by solving convex quadratic optimization problem.2. A multi-scale radial wavelet SVM robust regression model with accurate number input and interval number output is proposed. A multi-scale radial wavelet SVM is presented and applied to train the upper and lower limitation regression models respectively according to multi-scale interval number sample set contained by outliers. The two regression models have good robustness and generalization, and can approximate multi-scale signals effectively. Also the regression residuals of normal samples are small, while those of outliers are large. Then a weighted M-estimator is considered as cost function, and gradient algorithm is adopted to adjust the parameters of upper and lower limitation of regression models simultaneously. The weights and change with the relationships between the upper and lower limitation regression models. So, the effects of outliers can be reduced gradually and the problem that the lower limitation exceeding upper limitation is prevented.3. A SVM-basedclassification method for interval number is presented in this thesis. According to the comparison of interval numbers, the linear classification modele with interval number input is transformed into that with precise number input,and then SVM-based classifier for interval number is constructed. By designing an appropriate kernel function, the interval number samples are mapped to high dimension feature space in which a linear classification model is built. Thus the problem that interval number sample set can’t be separated linearly is solved easily, and the proposed algorithm overcomes the deficiencies of the existing classification algorithms for interval number, such as sensitive to input dimension, suitable for the small number of samples etc.4. A kernel-based fuzzy clustering algorithm for interval number is presented. By designing an appropriate kernel function, the algorithm can effectively cluster asymmetric/mixed data structure of multi-pattern prototypes, and meanwhile avoiding doing this directly in feature spaces. Also, interval number genetic algorithm is designed in this thesis to solve a highly non-convex optimization problem to get the global solution to the clustering problem, greatly improving quality of cluster. The proposed algorithm can overcome a common shortcoming of existent interval number clustering approaches. i.e. the cluster performance is unsatisfying for multi-pattern clusters and unbalanced data structure.5. Several interval-based uncertain data mining methods are used to steel & iron process, and the results of these approaches are compared with those of the existence methods. The results show that the proposed approaches can overcome the defiecencies of the existence methods, and have better results in steel& iron process data mining.