이산 입력을 위한 Recurrent Neural work 및 이를 이용한 한국어 음절 기반 언어모델 구현 : Recurrent neural network for discrete [韩语论文]

资料分类免费韩语论文 责任编辑:金一助教更新时间:2017-04-27
提示:本资料为网络收集免费论文,存在不完整性。建议下载本站其它完整的收费论文。使用可通过查重系统的论文,才是您毕业的保障。

The thesis aims to suggest a Recurrent Neural work (RNN) for discrete input and implement a Korean syllable-based RNN language model. The main components of a continuous speech recognition system are a vocabulary, an acoustic model, and a language...

The thesis aims to suggest a Recurrent Neural work (RNN) for discrete input and implement a Korean syllable-based RNN language model. The main components of a continuous speech recognition system are a vocabulary, an acoustic model, and a language model. Deep learning algorithm is investigated for Korean unlimited vocabulary continuous speech recognition system on three components. 1) recognition units in a vocabulary 2) an acoustic modeling using deep neural network based on data parallelism and 3) language modeling using recurrent neural network for discrete input.
Recognition units that are used in a conventional Korean speech recognition systems are mainly classified into a full-word and sub-word based on the morphological units. These recognition units have a problem of Out-Of-Vocabulary (OOV). These units can not represent all of the words in Korean by using a combination of the recognition units. In order to solve the OOV problem, a maximum likelihood based on Korean sub-word automatic vocabulary generation method is proposed. The proposed method is based on a Maximum Likelihood (ML) algorithm which does not require any prior knowledge to the language such as morphological analysis and space segmentation. The proposed method is completely automatic. It begins with the initial vocabulary consisting of approximately 3,900 Korean syllables defined in EUC-KR. The vocabulary is expanded by adding the pair of vocabulary words which allows for a maximum increase in likelihood generated by the language model for the corpora.
A DNN acoustic model using distributed learning method based on data parallelism is needed to train an acoustic model for large-scale training data. To improve speed of DNN learning for acoustic modeling, some researchers have used multi-core CPU system instead of single-core CPU system to make DNN learning as parallel task learning. However, the number of CPU cores were limited, because, the price of CPU cores and power consumption had high cost. On the other hand, the hardware of GPU can work simultaneously with thousands of threads on the available cores with a few over-head. It makes GPU very suitable for parallel computing especially in iterative and simple computation. For the large-scale amount of speech training data, a distributed DNN acoustic model learning method based on data parallelism is proposed by using Sun Grid Engine (SGE).
The conventional language model is an n-gram, which has the following two problems: 1) The probability for an unseen word sequence that does not appear in the training data is unstably estimated by using smoothing methods such as back-off or interpolation, and 2) owing to the limitation of n, the expression of information in a longer word history is limited.
The first problem can be solved by learning the model using a Deep Neural work (DNN) structure. A DNN has the advantage of guaranteeing the corresponding output value at all nodes of the output layer for an input once it has finished learning. The application of a DNN in acoustic models exhibited a higher performance for continuous input than the existing Gaussian Mixture Model (GMM) acoustic model. The representative input dimension of the DNN acoustic model is 600 (40 vector dimensions/frame × 15 frames), and the output dimension is approximately 10,000 (state) levels. The DNN language model guarantees the probability for all words in the lexical dictionary at the output layer once learning with a Feed-Forward Neural work (FFNN) is completed and a word history is given. Even though the FFNN was applied as a language model using a DNN, good performance compared to n-gram has not been ed. The first problem is that a considerable amount of time is required for learning because of the large number of calculations when a DNN is applied, which is caused by using vectors in a discrete space that only express an index similar to words and high-dimensional vectors as the input. Consequently, because it cannot use a lot of the learning data at present, it produces poorer performance than an n-gram that uses a large amount of learning data. The second problem a large number of calculations is required when the dimension of the output space is high, which requires a considerable amount of time for learning. For a vocabulary that contains 60,000 words, it will be expressed as 120,000 dimension for two words history. The input dimension of a language model is approximately 200 times higher than an acoustic model, and the output dimension is approximately six times higher than an acoustic model. The input and output dimensions of a language model exhibit a large difference from an acoustic model.
The contributions of this study are summarized using the following four points: 1) A maximum likelihood-based automatic vocabulary generation method for an Korean unlimited speech recognition was suggested. 2) A Sun Grid Engine (SGE)-based acoustic model learning method based on data parallelism was suggested for a large amount of acoustic training data. 3) RNN-based syllable language model learning method using input and output dimension reduction. Regarding the input, the dimension of high-dimensional discrete input is reduced by using the ‘Word2Vec’ method based on the distributional hypothesis. The dimension of output is reduced by using the 2-stage hierarchical softmax method, which is based on the word-category. 4) Using the points summarized in 3), a syllable-based Korean RNN language model was designed and N-best rescoring was conducted for performance evaluation of the Korean speech recognition system.
The results and conclusions of this study are as follows: 1) The unlimited Korean speech recognition system was established as follows. The maximum likelihood-based automatic vocabulary generation method for the unlimited Korean vocabulary automatically created a total of 200,000 recognition units without the Out-Of-Vocabulary (OOV) issue. The performance of a speech recognizer with the vocabulary was shown about 5% absolute WER decrease. 2) The Sun Grid Engine, which supports data parallelism, was used to parallel-learn the deep neural network acoustic model with about 320 hours of learning data in 5 Graphic Processing Units (GPUs) in approximately 8 hours. When the GPU quantity is doubled, the learning time was reduced by about half. 3) Regarding the input, this thesis suggested the possibility of expressing the conventional input dimension in 600 dimensions using the ‘Word2Vec’ method suggested in the 1-of-N coding (N = 20,000) method. It reduced the dimensions of the input by 1/33, and the learning speed improved by about 2 times. Regarding the output, the conventional output dimension was approximately at 20,000 because of the softmax method. However, using the hierarchical softmax method, the output dimensions could be expressed in 100 categories in stage 1 and in 200 words for each category in stage 2. As a result, the learning speed improved by about 2 times. 4) In the Korean Interactive Personal Assistant (IPA) domain, the Korean syllable-based RNN language model was implemented and the 10-best rescoring was applied. For 877 evaluation utterances, the sentence recognition rate increased from 57.70% to 60.37%, showing an absolute improvement by 2.67%.

免费论文题目: