基于条件随机场的泰语音节切分措施 [泰语论文]

资料分类免费泰语论文 责任编辑:Anchali更新时间:2017-06-24
提示:本资料为网络收集免费论文,存在不完整性。建议下载本站其它完整的收费论文。使用可通过查重系统的论文,才是您毕业的保障。

作  者:赵世瑜 线岩团 郭剑毅 余正涛 洪玄贵 王红斌 ZHAO Shi-yu,XIAN Yan-tuan,GUO Jian-yi,YU Zheng-tao,HONG Xuan-gui,WANG Hong-bin(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)

机构地区:昆明理工大学信息工程与自动化学院,昆明650500

出  处:《计算机科学》2017年第3期54-56,83共4页Computer Science

基  金:本文受国家自然科学基金:面向互联网的泰语-汉语双语语料获取及对齐措施探讨(61363044),国家自然科学基金:面向汉语-泰语跨语言新闻事件检索措施探讨(61462054),云南省教育厅重点项目:汉语-泰语跨语言新闻事件检索中的相似度计算探讨(2017Z021)资助.

摘  要:音节是泰语构词和读音的基本单位,泰语音节切分对泰语词法略论、语音合成、语音识别探讨具有重要意义.结合泰语音节构成特点,提出基于条件机场(Conditional Random Fields)的泰语音节切分措施.该措施结合泰语字母类别和字母位置定义特征,采用条件机场对泰语句子中的字母进行序列标注,实现泰语音节切分.在InterBEST2017泰语语料的基础上,标注了泰语音节切分语料.针对该语料的实验表明,该措施能有效利用字母类别和字母位置信息实现泰语音节切分,其准确率、召回率和F值分别达到了99.115%、99.284%和99.199%.Syllable is the basic unit of word-formation and pronunciation of Thai.Thai syllable segmentation is significant to lexical analysis,speech synthesis and speech recognition.Combined with the characteristics of Thai syllables,Thai syllable segmentation method based CRFs (Conditional Random Fields) was proposed.In order to achieve Thai syllable segmentation,the algorithm not only combines the Thai alphabet categories and letter position to define features,but also employs CRFs for letters in Thai sentence to do sequence labeling.In this paper,Thai syllable segmentation corpus was marked on the basis of InterBEST 2017.Experiments for the corpus demonstrate the method can effectively achieve Thai syllable segmentation by adopting the category and location information of alphabetical letters,and the va-lues of precision,recall and F reach 99.115%,99.284% and 99.199%.

关 键 词:条件随机场 

泰语论文题目泰语论文
免费论文题目: