The part-of-Speech tagging is a research for resolving morphological ambiguity and assigning appropriate list of category tags to a sentence. Generally, there are three approaches which are rule-based approach, statistical approach and hybrid approach...
The part-of-Speech tagging is a research for resolving morphological ambiguity and assigning appropriate list of category tags to a sentence. Generally, there are three approaches which are rule-based approach, statistical approach and hybrid approach. Rule-based approach has a high-accuracy, but it has a narrow coverage. On the other hand, Statistical approach has a broad coverage, but it has a low accuracy relatively. Hybrid approach has both merits of rule-based approach and statistical approach. So, these days, it is mainly used for a part-of-speech tagging system.
Rule-based information and statistical information complement for the mutual information in a part-of-speech tagging system using hybrid approach. But, improving performance of a part-of-speech tagging using both information has a limit.
This thesis describes improving accuracy of Korean part-of-speech tagging using resolution information for individual ambiguous word. Resolution information for individual ambiguous word is constructed with ambiguous words whose frequency is high in the Sejong corpus. Each information may have morphemes, morphological tags, and/or word senses of not only an ambiguous word itself but also words around it.
In the Sejong corpus, ambiguous words of frequency's 50.7% are 500 words. Resolution information for individual ambiguous word is constructed with this 500 ambiguous words. As a result of test corpus, the percentage of resolving ambiguity with this information is about 36%.
Experiment shows improving accuracy of Korean part-of-speech tagging using resolution information for individual ambiguous word.
,韩语论文,韩语论文网站 |