This thesis aims at finding the best window size for Korean word sense disambiguation. We experiment 64 different windows. Each window is described by (i, j) where i is left window size and j is right window size. Largest window size is 7. So, 63 wind...
This thesis aims at finding the best window size for Korean word sense disambiguation. We experiment 64 different windows. Each window is described by (i, j) where i is left window size and j is right window size. Largest window size is 7. So, 63 windows to (7, 7) except (0, 0) are tested. We also test all words in the sentence. We test the best window size for each part-of-speech, each word, and each high frequency sense baseline level.
Sejong morphological semantic analysis corpus is used for learning and test data. 90% of the corpus is used as learning data, and remainder 10% is used as test data. We select 1,902 words which not only have multiple senses but also are used as two or more senses in the corpus among 20,000 words occurred in the corpus.
We use only meaning words. Functional words such as josa and eomi are neglected in our experiment. And we use Naïve Bayesian model to learn and test word sense disambiguation. Experiment result shows that the best window size is a whole sentence. The whole sentence scored best with a 91.84% precision, and (7, 7) scored second best with a 91.17%. Nouns, incomplete-nouns, and adjectives have the best score in case all words in whole sentence are used. But verbs, adverbs, determiners scored best at (5, 5), (2, 6), (2, 3) respectively. Whole sentence scored best also in experiment for each high frequency sense baseline level. So, we can conclude that whole sentence is the most appropriate for word sense disambiguation. We will experiment with other learning model as well as with more features including functional words in the future.
,韩语论文,韩语论文范文 |