번역 모델을 이용한 다국어 문장 정렬 [韩语论文]

资料分类免费韩语论文 责任编辑:金一助教更新时间:2017-04-27
提示:本资料为网络收集免费论文,存在不完整性。建议下载本站其它完整的收费论文。使用可通过查重系统的论文,才是您毕业的保障。

Recently, to overcome limitation of rule-based machine translation, many researchers have studied about statistical machine translation. Statistical machine translation is the method for deciphering an input document, written in a source language, usi...

Recently, to overcome limitation of rule-based machine translation, many researchers have studied about statistical machine translation. Statistical machine translation is the method for deciphering an input document, written in a source language, using probabilities. For training, we get to conditional probabilities for words of two languages from parallel corpora, consisted of a set of pairs of two sentences written in different languages but these are same meaning, and we get to context probabilities from a target language. In this process, we need to a lot of parallel corpora for the good result of translation, but it needs a lot of times to collect parallel corpora manually. But it is very easy to collect bilingual corpora, we need to sentence alignment for converting from bilingual corpora to parallel corpora automatically.
Sentence alignment is a task to find to the corresponding sentence between two documents which consists of different languages. The traditional way is the length-based method. This method only depends on the fact that the lengths of aligned sentences in a source and target language are highly correlated. So it cannot guarantee same meaning sentence about result of sentence alignment. For solving this problem, the lexical-based method, used lexical information within input documents, is proposed. But this method is very slower than the length-based method. And it cannot guarantee good result for different languages which have different language’s structures, like to Korean and English. For solving this problem, others use to bilingual dictionary instead of lexical information within input documents. This method cannot guarantee a good result if the document is appeared that multiple words of a source language correspond to one word of a target language, vice versa.
In this , for solving the problems of previous sentence alignment, we propose a new method that combines length based method and lexical information. The proposed method is follows: (1) We translate a source document and a target document into English using the existing machine translation system. (2) We use a monolingual sentence alignment method. In this method, we use lexical information instead of case penalty of beads. Then (3) we convert the result of (2) into an original source language and target language.
As a result, in sentence alignment between Korean and English, we can see the performance of 96.20% using the F-1 measure. This result is higher than all of previous method. Also, to prove generality on this method, we experimented on multilingual language pairs, consisted of total 34 pairs. In this experiment, we can see that our method have about 2.27% higher than previous length-based methods on average.

韩语论文韩语论文网站
免费论文题目: