GMM and SVM - based text -independent speaker verification method
|School||University of Science and Technology of China|
|Course||Signal and Information Processing|
|Keywords||Speaker Verification University of Science and Technology of China Feature Transform Speaker model Text-independent PhD thesis Speaker Recognition Characteristic data Kernel function Training data|
Text-independent speaker verification on telephone speech has become one of the important fields of speaker recognition. Support Vector Machine (SVM) is a discriminative approach, which seems well suited to speaker verification. When cepstral features, such as MFCC, are used for text-independent speaker verification, lots of speech is needed. So, as a modeling technique in text-independent speaker verification, SVM has much difficulty in handling a great deal of training data. As a generative model, Gaussian Mixture Model (GMM) has become the dominant modeling approach in text-independent speaker verification for its robustness and scalability. GMM can easily model the statistical distribution of the training data, and its likelihood shows the similarity between the model and the test data. But, GMM is trained from the target speaker training data, which is only one class of the training data.In this thesis, we develop the techniques required for SVM to work well on text-independent speaker verification using GMM. The main research work is focused on speaker modeling strategy based on SVM, feature transition for SVM, threshold setting and score normalization, etc.First, when training speaker models in text-independent speaker verification, we train a SVM model for every target speaker. To reduce the number of the impostors in SVM training time, this thesis proposes two novel GMM-based selection methods to choose a few typical impostors which are most close to the target speaker. The new methods make the SVM models more discriminative.Second, this thesis proposes a new speaker verification approach based on GMM-based feature transition and SVM. When GMM is used to cluster, a small quantity of typical feature vectors are extracted from large numbers of speech data, which makes it much easier to train SVM models. Because of more excellent scalability, robustness and especially comparability, we replace GMM with adapted GMM which is adapted from the UM (Universal Model) using MAP adaptation with the corresponding training data.According to the characteristic of the GMM based on UM-MAR, this thesis proposes an improved feature transition method based on UM-MAR in the new method, GMM mean vectors are normalized by the UM. Obviously, experiments on NIST showed that comparing with the GMM-UBM baseline, 21.6% relative reduction was achieved in EER. Third, this thesis proposes another novel text-independent speaker verification system based on GMM and SVM, which combines the advantages of both of them. In the new method, GMM is not only used as a feature transition method to extract a few discriminative GMM multi-likelihoods vectors for SVM, but also used as a target speaker model for speaker verification. Experiments on text-independent speaker verification in NIST data showed 14.9% relative improvement compared to the baseline GMM-UBM system.At last, some work is focused on threshold setting and score normalization. In speaker verification, the bimodal distribution parameters of the output scores based on the target speaker model are different, which make it difficult to estimate a mutual threshold. In this paper, we propose a novel score normalization------TZ normalization, which combines the traditional Zero normalization and Test normalization. Text-independent speaker verification experiments on NIST data showed significant improvements for this new technique compared to the traditional techniques.