VQ-based model and BP network of high naturalness speech
|Course||Signal and Information Processing|
|Keywords||Voice Changer VQ Models Suprasegmental features BP Network|
Speaker speech technology is the source speaker voice said into talking like a target speaker voice technology. Speaker's speech with a wide range of applications, such as text to speech (Text a to-Speech, TTS) system, dubbing systems and confidential communications. This paper presents a model and BP neural network based on VQ high naturalness of speech conversion method. Algorithm is divided into three parts: the first two parts with VQ model implements speech spectral envelope and its incentive to convert, and the third part is the rhythm of BP algorithm for voice conversion rules modeling. Algorithm for pitch cycle waveform characteristics residuals proposed cyclic cross-correlation function, effectively achieve residual waveform clustering; against Chinese speech characteristics suprasegmental pronunciation rhythm adjustment, the effective realization of the Chinese speech conversion and get high naturalness synthetic speech. This paper mainly consists of: (a) VQ-based model of the voice of the spectral envelope conversion. Spectral envelope with 20 order LPC coefficients into line spectrum formed LSF representation of the frequency coefficients. Relative to the LPC parameter, LSF interpolation has better characteristics and quantization characteristic. Training speech were obtained source code 128 and 128 target speech vector of the code vector and the code vector for each voice source code vector to the target speech codebook mapping, the mapping code book on the target speech vector of the linear weighted synthesis factor. The converted voice closer to the target LSF coefficients voice speaker LSF coefficients. (2) VQ-based model achieved its incentive conversion. The conversion of the residual divided into two stages, first the residual energy conversion, linear conversion method; Second, the conversion of the residual waveform, the method is based on the VQ codebook mapping model. In the residual waveform conversion, the cross-correlation function is defined circulation and maximum cross-correlation values ??opposite number as a measure of the distance between waveforms. The speech residual signal after conversion of the retention of the information of the target speaker. (3) the use of BP algorithm modeling of speech prosody transformation rules. Extract the source speaker and the target speaker's relative-frequency curves. With a three-layer BP network training to get the right value mapping. In the transformed curve relative to the fundamental frequency plus the average target speech get converted baseband-frequency curves. Algorithms for Chinese speech characteristics suprasegmental pronunciation rhythm adjustment, the effective realization of the Chinese phonetic transform and get high naturalness synthetic speech, experiments show that an effective Chinese speech conversion algorithm.