Nowadays a corpus-based unit concatenation text-to-speech (TTS) system has been widely used because of its high quality synthesized speech. The high quality synthesized speech in a corpus-based TTS is obtained by using a large amount of speech DB in i...
Nowadays a corpus-based unit concatenation text-to-speech (TTS) system has been widely used because of its high quality synthesized speech. The high quality synthesized speech in a corpus-based TTS is obtained by using a large amount of speech DB in implementing the system. However, it is a difficult job and costs very much to collect a phonetically balanced large amount of speech DB, and segment to extract synthetic units having various voice characteristics. Thus, it is generally used for the serve based TTS system and is hard to be applied to the embedded system such as mobile devices having the limitation of the memory size. On the other hand, an HMM-based text-to-speech system (HTS) has recently drawn much attention to overcome such a problem. The HTS uses the statistical model, hidden Markov model (HMM) as a synthetic unit, to represent the spectra and prosodic characteristics of the speech signal. Thus the synthesis engine needs less memory and low computation complexity and is suitable for the embedded system. It also has the advantage that voice characteristics of the synthetic speech can be modified easily by transforming HMM parameters appropriately. |