음성분리를 위한 CASA 시스템의 성능 향상에 관한 연구 [韩语论文]

资料分类免费韩语论文 责任编辑:金一助教更新时间:2017-04-27
提示:本资料为网络收集免费论文,存在不完整性。建议下载本站其它完整的收费论文。使用可通过查重系统的论文,才是您毕业的保障。

As the information-oriented society develops, the research for human’s speech and machine interface is recently underway. Since the background noise deteriorates the performance of the machine, a wide range of applications such as speech recognition...

As the information-oriented society develops, the research for human’s speech and machine interface is recently underway. Since the background noise deteriorates the performance of the machine, a wide range of applications such as speech recognition, navigation, home automation, hearing aid and so forth have difficulty utilizing in real life. Speech enhancement and the noise reduction algorithm for decreasing the effect of noise are fundamental and vital algorithms. While the spectral substraction and Wiener filter algorithm give high performance in the stationary noise environment, but the real environment is the nonstationary noise environment in which the performance is drastically decreased.
To solve the problem, speech segregation algorithm which separates speech signal from noise signal is required. The existing algorithm for improving speech quality by using speech segregation is categorized into both the way of using monaural microphone and that of using multiple microphones. ICA(Independent Component Analysis) using multi- microphones can do effectively speech segregation. In the spatial constraints, however, the speech segregation is hard to be accomplished. Among the speech segregation algorithm using monaural microphone, CASA(Computational Auditory Scene Analysis), which imitates human auditory system, can separate the speech component effectively.
The CASA for modeling human auditory organs is decomposed into time-frequency domain by utilizing gammatone ERB filterbank and the speech feature decomposed is used to separate the speech by the pitch, amplitude modulation, onset and offset, and harmonic information.
It is segmentation and grouping that are important steps of the process in CASA algorithm. After segmentation step, the speech signal is decomposed into time-frequency domain, constructing the binary mask through time continuity and periodicity. The grouping step in CASA makes those segments groups form the same source combined into one stream. The existing CASA can efficiently separate speech by using binary mask. If the speech separation is conducted, speech region of speaking speech must be recognized. However, it is hard to detect the speech region in the noise environment and the voice activity detection which is badly formed can cause the speech loss. Moreover, it is in the background noise environment that it has a difficulty in composing the binary mask. When the binary mask is composed, if noise contains the periodicity similar to that of speech, the noise is regarded as a speech and residual noise is contained in the separated speech. This problem can lead to deteriorate the speech quality using the existing CASA.
In this dissertation, we propose the speech area detection algorithm and the segmentation algorithm to improve the various speech separation and speech quality in CASA. At the first stage, the proposed voice activity detection is the algorithm used by cochleagram’s periodic component to aperiodic component rate which can detect the speech regardless of the variation of SNR(Signal to Noise Ratio).
At the second stage, the proposed segmentation algorithm is the one designed for improving a speech quality by composing improved speech segment group and minimizing residual noise. Under segmentation stage, when the binary mask is composed, if the noise has a periodicity similar to that of voice, there is a problem that a noise is regarded as a voice.
To raise the level of difficulty on the crosscorrelation function similarity, if noise has periodicity, according to the delay time difference of autocorelation delay signal between channels, it is converted into weight and reconstructs the binary mask by applying the weight to the crosscorrelation function. In a grouping step, the residual noise is minimized through undated period component pitch information. The proposed CASA, in accordance with detecting exact speech region and minimizing residual noise, improves the speech separation and speech quality in a continuous speech.
In this dissertation, We apply the database which is a standard Korean common ETRI speech database and noise database which consists of white noise and six noises of PNL. Comparing the proposed voice activity detection algorithm with the existing one, even though the computation is increased, the performance of speech area detection is highly increased in SNR 5dB and 0dB up to 11.7% and 17.9% respectively on the maximum average of Corr(Utterance correct rate). Also, the proposed segmentation algorithm is increased up to 1~2 sec. computational complexity comparing with that of existing segmentation algorithm. The speech segregation performance, however, shows the improvement of 2dB and 1.65dB at SNR 5dB and 0dB respectively. As comparing the proposed CASA system with the existing one, the accuracy of voice activity detection is increased and the residual noise is minimized so that the speech quality is improved to the maximum average 6.17dB and 6.29dB in SNR 5dB and 0dB respectively.

免费论文题目: