Research on the Robustness Enhancement in the Speaker Recognition
|School||Beijing University of Posts and Telecommunications|
|Course||Signal and Information Processing|
|Keywords||speaker verification text-independent Gaussian mixture model supper vector support vector machine cross similarity measurement robustness enhancement|
Speaker recognition, also known as voiceprint recognition, is an important branch of speech signal processing and is one of the most popular research fields about speech. Speaker recognition is the one that authorizes the speaker identity based on its voice or speech. This paper introduced the basic principle of speaker recognition and research history and status quo, and mainly focuses on how to enhance the robustness of the text-independent speaker recognition. Two of the dominative modeling methods: Gaussian mixture model and support vector machine were discussed. Various kinds of technologies for the robustness enhancement were analyzed in the feature, model and score domain respectively. This paper proposed a combined SVM system using GMM supper vector and its test score normalization, and a cross similarity measurement and its use for the score normalization and symmetric scoring.In the research, the prevalent and open-source speech recognition toolkit developed by Cambridge University is used to build the verification system. The final system employs the PLP feature parameter and various techniques, such as RASTA filtering, feature compensation and transformation, model adaptation, score normalization and nuisance attribute projection, were used to optimize the system. As the experimental results shown, the final recognition system performed well and approached the state-of-the-art level according to the results of NIST Year 2006 and 2008 Speaker Recognition Evaluations announced by the national institute of standards and technology.In this paper, experiments were mainly conducted on the text-independent speaker verification system under the telephonic and microphonic channels. But it is worthwhile to point out that many techniques and methods are valuable for the reference and application by other speaker verification tasks, speaker identification and even speech recognition.