Research of Data Analysis and Pre-selection Algorithm for Support Vector Machine Speech Recognition
|School||Taiyuan University of Technology|
|Course||Information and Communication Engineering|
|Keywords||speech recognition support vector machine kernel matrix principal component analysis support vector pre-selection|
With the development of the basic scientific theory, artificial intelligence technology has improved continuously, and it will be the very big application space. Based on artificial intelligence algorithm of speech recognition technology is gradually mature.Recognition system become more and more powerful, which can realize the man-machine interaction, voice control, etc.Support vector machine (SVM) is a kind of recognition algorithm based on statistics,which overcomes the training set sample number less, linear inseparable, dimension disaster, and the most superior local problems.Generalization ability and accurate classification ability of this algorithm is much better than others.It is suitable for speech recognition system.Support vector machine has solved the less number of data samples problem,but in the process of training support vector machine will run a lot of matrix operation and take up a plenty of storage of nuclear matrix, for this reason leading to the phenomenon of long training time. The purpose of this research is to reduce the training time of support vector machine,in the conditon of the large amount of data.With this purpose, the article uses principal component analysis to deal with MFCC of speech data,from the dimension of input speech data aspect,and enture the recognition accuracy of the speech recognition system not falling,through eigenvalue contribution, in order to significantly reduce the training time of support vector machine. In addition,because only support vectors contribute to the trained model of decision-making function, grouping training pre-selection algorithm is presented from the perspective of data selection and training method.Through this method we can reduce the redundant portion of training data and reduce the number of training data,so that the training time of the speech recogniton system is reduced.At last, under large sample data set the experiment verifies that the speech samples in different size, different SNR under applying principal component analysis into MFCC and group training pre-selected algorithms support vector machine training can shorten the time,which confirm the effectiveness of the two methods proposed in this paper.