The development of today's computing environment is being driven by applications which providing user-centric service. User interface technologies are clearly important for emerging domains such as ubiquitous, pervasive, and wearable computing environ...
The development of today's computing environment is being driven by applications which providing user-centric service. User interface technologies are clearly important for emerging domains such as ubiquitous, pervasive, and wearable computing environments. In such environments, speech recognition technology is one of the most useful interface technologies. However, because current noise-canceling technology remains immature, it may not able to realize a high performance speech recognition system in a noisy environment such as a street, a hall or a factory.
In this thesis, we present an extended ZCPA model for a speaker-independent speech recognition system using a throat microphone and propose a valid frequency bound for analyzing a throat signals. The use of this kind of microphone minimizes the impact of environmental noise. Due to the absence of high frequencies and the partial loss of formant frequencies, previous systems using throat microphones have shown a lower recognition rate than systems which use standard microphones. Therefore, we consider a new methodology for enhancing speech features extracted from a throat microphone signal.
First of all, we experimentally prove that the Mel-cepstrum feature is not suitable for analyzing a throat signals. In order to solve this problem, we present an extended ZCPA model for a throat signal recognition. The new model is composed of two features. One is zero crossing-based peak and the other feature is zero-crossing rate. We then propose a valid frequency bound for throat signal analysis based on various methods including analysis the output of a band-pass filter, the feature distribution analysis among speakers and the correlation analysis among feature vectors of individual speakers. The results of this process show the valid frequency bound for throat signal analysis is between 200Hz and 2001Hz.
Finally, we propose three methods to improve the performance of the system. First, in the feature extraction module, we reduce time and space complexity of ZCPA model using the valid frequency bound for a throat signal analysis. Second, when a RASTA filter for channel normalization is applied, it shows a feature distortion problem. To solve this problem, we propose using a Peak Mean Subtraction filter. Third, we propose the TDNN(time-delay neural network) structure for training of throat signal features.
For the performance evaluation of the proposed scheme, a throat recognition experiment has been designed and conducted. The experimental data consists of 50 words spoken by 126 males. One hundred sets were used for training and the other 26 sets were used as test patterns. A recognition system using Mel-cepstrum shows unsatisfactory performance of about 67%. Our system using extended ZCPA model shows performance of about 89.2%. The recognition system described here has made a significant contribution to the development of a useful speech interface for noisy environment.
,韩语论文范文,韩语论文 |