소용량 음성 DB를 이용한 HMM 기반의 한국어 음성합성 [韩语论文]

资料分类免费韩语论文 责任编辑:金一助教更新时间:2017-04-28
提示:本资料为网络收集免费论文,存在不完整性。建议下载本站其它完整的收费论文。使用可通过查重系统的论文,才是您毕业的保障。

Nowadays a corpus-based unit concatenation text-to-speech (TTS) system has been widely used because of its high quality synthesized speech. The high quality synthesized speech in a corpus-based TTS is obtained by using a large amount of speech DB in i...

Nowadays a corpus-based unit concatenation text-to-speech (TTS) system has been widely used because of its high quality synthesized speech. The high quality synthesized speech in a corpus-based TTS is obtained by using a large amount of speech DB in implementing the system. However, it is a difficult job and costs very much to collect a phonetically balanced large amount of speech DB, and segment to extract synthetic units having various voice characteristics. Thus, it is generally used for the serve based TTS system and is hard to be applied to the embedded system such as mobile devices having the limitation of the memory size. On the other hand, an HMM-based text-to-speech system (HTS) has recently drawn much attention to overcome such a problem. The HTS uses the statistical model, hidden Markov model (HMM) as a synthetic unit, to represent the spectra and prosodic characteristics of the speech signal. Thus the synthesis engine needs less memory and low computation complexity and is suitable for the embedded system. It also has the advantage that voice characteristics of the synthetic speech can be modified easily by transforming HMM parameters appropriately.
In this thesis, we implemented an HMM-based Korean text-to-speech system using a small sized Korean speech DB. We used the HTS software released on the Internet website with some amount of ETRI 611 DB and SeoulMal DB. The ETRI 611 DB, phoneme labeled 611 words originally made for training the speech recognition system, was used to generate initial HMMs that represent context-independent monophone acoustic models. With the monophone HMMs, then, SeoulMal DB was used to generate context-dependent triphone HMMs. We used the {preceding, current, succeeding} phonemes, position of the current phoneme in the current phrase and the number of syllables in the current phrase as contextual factors to model context-dependent HMMs.
The synthesized speech has shown very intelligible vocoded speech quality though naturalness was not enough. This is because, we think, prosodic feature parameters were not modeled well in the HMM training procedure due to the limited speech DB. Thus we increased naturalness of the synthesized speech a little by simply controlling the pitch pattern of the phrase and sentence. The file size of the implemented HMM-based Korean text-to speech system was about 1.3 Mbytes, so it could be used for the embedded system.

韩语论文范文韩语论文题目
免费论文题目: