한국어 형태소 복원 확률 모델의 성능 향상 [韩语论文]

资料分类免费韩语论文 责任编辑:金一助教更新时间:2017-04-26
提示:本资料为网络收集免费论文,存在不完整性。建议下载本站其它完整的收费论文。使用可通过查重系统的论文,才是您毕业的保障。

The probabilistic models for Korean morphological analysis learn the analysis rules from POS tagged corpus, and build automatically Korean morphological analyzer(MA). The models are feasible and practical, because the learning data collections such as...

The probabilistic models for Korean morphological analysis learn the analysis rules from POS tagged corpus, and build automatically Korean morphological analyzer(MA). The models are feasible and practical, because the learning data collections such as Sejong corpus, ETRI corpus, KAIST corpus and etc., are available and the performance of the automatically generated MAs is comparable to the state of art.
The first step of the probabilistic models is to recover the morphemes from surface forms in an Eojeol. It affects the next steps, and consequently it is important for the whole performance. But recovering morpheme is not so simple, because a Korean Eojeol is both inflectional and agglutinative, and it tends to be long and complex.
This compares various calculation methods to improve the morpheme recovery model. The recovery model is implemented as an application of statistical machine translation model which is composed of translation sub model and language sub model. Various calculation methods are compared for the main recovery model and sub models; probability calculation methods for the translation sub model, smoothing methods for the language sub model, Hangul syllable encoding schemes, learning data selection methods and model decoding methods for the main recovery model.
The experiment on various combinations of the methods was done with Sejong POS tagged corpus with 10-fold cross validation test. The result showed that choosing the best combination of the methods improved the performance from 95.57% to 99.49%. It means that the various methods used in this affected the performance and should be selected carefully for the morpheme recovery model. The recovery model and its various calculation methods may be used for other languages which have inflectional features, because they are basically language independent except Hangul syllable encoding schemes.

韩语论文网站韩语论文
免费论文题目: