Morphological analysis is a basic step for natural language processing. Morphological analysis produces multiple analysis results for a word phrase. Korean is not only an agglutinative language, but also has the characteristic of inflectional language...
Morphological analysis is a basic step for natural language processing. Morphological analysis produces multiple analysis results for a word phrase. Korean is not only an agglutinative language, but also has the characteristic of inflectional language which needs complicated processing for morphological analysis.
In general, three basic operations are needed for Korean morphological analysis: morpheme restoration (R), morpheme segmentation (S), and morpheme tagging (T). The morphological analysis is done in various analysis process unit such as Eojeol(word phrase), syllable, Jaso(alphabet) etc. Various methods have been developed with the combinations of the three basic operations and various analysis process units.
In this , we define syllable-based probabilistic models for Korean morphological analysis, and implemented them with cascading statistical machine translation (SMT) model and conditional random fields (CRFs) models. The lexical forms of morphemes are restored by SMT model, and the morpheme sequences are segmented into each unit, and the POS tags are attached to each unit by CRFs models. They are implemented with currently available machine learning tools such as Moses, SRILM and CRF//. As these well-known tools have been already verified by many researchers in various areas, we think they are more objective and reliable than locally developed programs. For the integration, we used Beam search by using the limited number of output in each steps. For the more proper integration, we rescaled the output ranks and probabilities because the SMT and CRFs tools produce different scales and ranks. The rescaling improved the 10-best recall performance respectively about 4.79% (R-ST), 6.042% (R-S-Ts) and 7.165% (R-S-Tm).
,韩语论文网站,韩语论文 |