日语自动词性赋码器的信度探讨[日语论文]-外语论文网

作　　者：毛文伟[1] MAO Wen-wei （ Shanghai International Studies University, Shanghai 201783, China）

机构地区：[1]上海外国语学院,上海201783

出　　处：《外语电化教学》2017年第3期 10-14,共5页Media in Foreign Language Instruction

基　　金：本文是2017年教育部人文社会科学探讨青年基金项目课题“中国日语学习者表达失误的认知语言学探讨（12YJC740076）”和上海外国语学院青年科研创新团队项目“基于日语学习者语料库的二语习得探讨（QJTD11MWW01）”的探讨成果.

摘　　要：词性自动赋码技术的日臻成熟为语料库建设提供了有力支撑。与本族语语料不同，学习者产出中充斥着大量错误。这必然会对赋码的准确性造成干扰。因此，日语论文，除了精度以外，抗干扰能力也是需要着重考虑的因素。本文统计并比较了日语开源自动词性赋码器对学习者语料赋码的精度以及赋码信度与语料质量的相关性。从中发现，MeCab表现最出色，ChaSen次之，JUMAN则稍逊一筹。此外，日语论文，探讨证实，日语开源赋码器对学习者语料赋码的精度甚至超过了本族语语料。因此，完全可以充当语料库建设的可靠工具。The automatic POS tagging technology has matured to provide a strong support for the corpus building. Unlike the native speaker＇ s corpus, the learner＇ s outputs are flooded with errors. This will definitely interfere with the accuracy of the tagging. Therefore, in addition to accuracy, the anti-interference ability should also be taken into account. This paper focuses on the Japanese open-source automatic POS taggers, calculates the accuracy when they are used to tag a group of the learner＇ s texts and observes whether the performance are affected by the quality of texts. Results of the study indicate that MeCab is the best and ChaSen acts better than JUMAN. It is also proved that the accuracy of the learner＇ s corpus tagging is even better than the performance when they are used to tag the native speaker＇ s corpus. Therefore, the taggers can be used as a powerful tool during the construction of learner＇ s corpus.

关键词：语料库赋码隐马尔科夫模型日语

分类号：H319.3[语言文字—英语]

题目：中国の若者に対する日本艺术の影	歌舞伎和京剧共同点的相关探讨/	「旅愁」--抒情の1900年代から1930年代へ
“互联网＋”在基础日语课程中的探究及	从牡丹和樱花的比较看中日两国审美意识	探讨日语各词类体现在音声上的语法性
浅谈认知语言学在日语条件句用法解释中	论日语相对性时间表现/日本語の相対的な	时间与条件联系的接点--以日语接续助词
试析日语口语交际能力的培养策略	日本保险企业的破产清算经验及启示	中日大件垃圾处理的比较/中日の粗大ゴミ
中国文学原型在日本文学中的置换变形	日本农产品批发市场交易与中国农产品流	从新版《标日》词汇教育谈对初级阶段日