Research on HMM-Based Cross-Lingual Speech Synthesis
|School||University of Science and Technology of China|
|Course||Signal and Information Processing|
|Keywords||Hidden Markov Models Speech synthesis Decision tree clustering Cross - language model adaptive|
Speech synthesis is generated artificial human voice to give the computer as people are generally comfortable the ability to speak, most of the current speech synthesis research focus at this stage of the TTS (Text-To-Speech, TTS), the upcoming general language text converted to voice. Over the years, with the rapid development of voice technology, speech synthesis technology has become increasingly mature, the synthesized voice has significantly improved sound quality and naturalness. However, how to further improve the performance of the speech synthesis system, computer synthesized sounds more clear and natural has been the focus of the speech synthesis research field. In addition, as the international exchange of increasingly frequent international exchanges only with a single language to communicate often been unable to meet the needs of people, has an urgent need for cross-lingual speech synthesis system. How to complete the cross-lingual speaker adaptation in the case of the absence of the target language data, thus achieving cross-lingual speech synthesis system to facilitate international exchanges and communication, is the focal point of this paper work. The following is the order of the structural organization of the paper: paper briefly describes the research background. First introduced the technology needs of cross-lingual speech synthesis technology and application background, then introduces several existing mainstream speech synthesis method, Finally, the main research direction - concept HMM-based cross-lingual speech synthesis technology method substantially description. The first half of the second chapter of the thesis Introducing the most commonly based on the basic framework of the model can be trained HMM speech synthesis technology (Trainable TTS) processes and key technical point, the second half of the detailed description of this speech synthesis system based on the same multilingual speaker the model adaptive technology framework and related algorithms. The contents of these two aspects is the basis of this study work, and also the subsequent chapters the basic starting point of the research content. The third chapter describes the improvement of the system of the second chapter. Here start from the most relevant parameter speech synthesis system languages ??module - based on the decision tree model clustering efforts to study how to improve the existing baseline system synthesis effects. The study examined the Decision Tree selection of the division of different guidelines, to determine the impact of the different conditions of the splitting and stopping criteria and their different combinations on the final clustering effects and synthesized speech. Chapter four major departure from the phoneme mapping ideas to achieve cross-lingual Chinese and English speech synthesis model adaptive. For the simple phoneme mapping ineffective problem in cross-lingual speaker model adaptive combination of adaptive data selection, amended and improved in English phoneme mapping table at the same time through the tune type mapping and rhythm in English mapping and other methods to take full advantage of communicating prosodic information in languages ??other than English, the adaptive effect. Thesis on the basis of previous research, to achieve one of the English cross-lingual speech synthesis system, the system can simulate any effects of Chinese pronunciation and English pronunciation of a Chinese speaker, and not even in this Chinese speaker The premise of an English-speaking, but also good to synthesize the English pronunciation statement with its tone characteristics.