Compensation Methods of Different Speech Coding for Speaker Recognition
|School||Harbin Institute of Technology|
|Course||Computer Science and Technology|
|Keywords||Speaker identification Text-independent Speech coding Maximum A Posterior estimation Maximum Likelihood estimation Score compensation|
There are so many advantages for speaker recognition technique, including flexibility, economy, accuracy, extensibility, and so on, thus it has a broad application future in biometrics recognition field. Although the system performs well in the lab, the performance descents rapidly because of the influence of various factors in the real world. One of the main factors affecting the performance is the code mismatch between training data and testing data. Especially in speaker recognition under network environment, the available training data is from some speech coder, however, in actual use the testing data is from another speech coder. In this situation, the performance of speaker recogonition is seriously affected. In order to improve the speaker recognition performance under network environment, enhance system practical level, first of all, we need to resolve speech coding mismatch problems, that is eliminating the influence resulted from the code mismatch in training and testing conditions.This paper mainly studies compensation approaches, which effectively overcome the impact of different speech coding, so as to improve the speaker recognition performance under network environment. These approaches compensate mainly in the feature domain and scoring domain. In encoding feature compensation, the MAP (Maximum A Posterior) method and the ML (Maximum Likelihood) method are applied to the speaker recognition systems. In scoring compensation, the likelihood ratio score normalization method that has been used in the channel compensation is adopted, so as to further improve system performance. We recognize firstly by GMM(Gaussian Mixture Model), and then make secondary judgement based on using coding score normalization, and finally get the recognition results. The baseline system we used is text-independent speaker identification system. Experimental results show that by firstly using MAP method to coding compensation, then using likelihood scores method to scoring compensation, the best recogonition rate is 83.4% in open set tests.