泰语文语转换系统中的文本略论和处理[泰语论文]-外语论文网

泰语文语转换系统中的文本略论和处理Thai text to speech text analysis and processing system.

摘要:（摘要内容经过系统自动伪原创处理以避免复制，下载原文正常，泰语论文，内容请直接查看目录。）

语音分解是用盘算机将输出文本转换为人能听懂的语音旌旗灯号的进程。语音分解和语音辨认是完成人机语音通讯的需要支持技巧。文语转换体系是现阶段完成语音分解的有用措施,分解语音的天然度已成为作用该技巧推行应用的症结。文语转换体系,分为前端文天职析模块和后端语音分解模块,文天职析与处置的后果将直接决议分解语音的天然度。本文以开辟泰语文语转换体系为目标,研究并完成泰语文天职词、归一化和罗马化。论文的重要任务包含：1.针对泰语特色,构建泰语字符联缀集,并将其应用于前后向最年夜婚配分词算法中。试验成果注解,含未登录词语料的分词准确率,可由85.69%晋升到94.04%。2.提出基于规矩和症结词相联合的办法,完成泰语文本归一化。在特别字符处置模块中,先对泰语文本中涌现的数字、物理单元、泉币符号、缩略语等的特别字符停止分类。归结易发生歧义的字符类型,构建症结词辞书。在此基本上,编写C说话法式停止特别字符的处置,胜利将其转化为尺度的泰语文本。试验成果注解：集内测试准确率为97.83%,集外测试的准确率为97.12%,且年夜部门非尺度词的消歧准确率到达95%以上。3.依据泰语音节构造的特色,归结、整顿元音和子音和韵母中的元音和尾子音的搭配规矩,在此基本上,以音节为根本单位,用Perl剧本说话编写法式完成泰语文本的罗马化。测试成果注解,罗马化成果可知足后端语音分解的请求,并从中可表现分词、文本归一化的成果。

Abstract:

Speech decomposition is by computer will output the text into one can understand the speech signal process. Speech decomposition and speech recognition are the need to support the communication of human machine. Text to speech system is a useful way to complete the speech decomposition at the present stage, the natural degree of decomposition of speech has become the key to the implementation of this technique. Text to speech system as a front-end Wentian job analysis module and back-end speech decomposition module, Wentian job analysis and disposal of the consequences will directly determine the speech decomposition of natural degree. In this paper, in order to open up a Thai text to speech system, and completed a study of Thai text analytics words, normalization and Rome. Main task of this paper contains: 1. For Thai characteristics, constructing Thai characters lacing set and applied before and after to the maximum matching segmentation algorithm. The results of the test notes, with the word accuracy of the word material is not logged, can be promoted to 94.04% from 85.69%. 2 proposed rules and key words combination approach based on complete Thai text normalization. In the special character disposal module, special character of Thai text in the emergence of digital, physical unit, spring currency symbols and abbreviations classification. It is prone to ambiguity character type, construction of key words dictionary. This basically, write C speak French special characters stop disposal, victory will be transformed into Thai text scale. Test results of notes: test set accuracy rate was 97.83% and test set accuracy was 97.12%, and Nianye sector non scale word disambiguation accuracy rate reached more than 95%. 3. On the basis of the Thai syllable structure characteristics, attributed, rectify the vowels and consonants and vowels in vowel and Oko sound collocation rules. On the basis of, the syllable as a basic unit, using Perl script to speak French Thai text Romanized writing. Test results note, Rome, the results can meet the requirements of the back-end voice decomposition, and from the performance of the word segmentation, text normalization results.

目录:

摘要 3-4 Abstract 4-5 第一章绪论 9-17 1.1 引言 9-10 1.2 泰语语音合成近况 10-11 1.3 常用的文本略论处理措施 11-14 1.4 论文探讨思路及作者的工作 14-15 1.5 论文的组织框架 15-17 第二章泰语文本处理基础 17-25 2.1 泰语简介 17 2.2 文本收集及词典的构建 17-19 2.2.1 文本语料收集 17-18 2.2.2 词典的构建 18-19 2.3 泰语词性及标注 19-23 2.3.1 泰语词性 20-21 2.3.2 词性标注 21-23 2.4 泰语音素列表 23-24 2.5 本章小结 24-25 第三章分词措施及实现 25-37 3.1 分词技术 25-27 3.1.1 基于词典的分词措施 26-27 3.1.2 基于理解的分词措施 27 3.1.3 基于统计语言模型的分词措施 27 3.2 泰语分词探讨 27-28 3.3 泰语分词实验步骤略论 28-34 3.3.1 泰语字符连缀集的构建 30-32 3.3.2 前后向最大匹配算法 32-34 3.3.3 匹配结果略论 34 3.4 实验及结果略论 34-36 3.5 本章小结 36-37 第四章文本归一化 37-51 4.1 文本归一化 37 4.2 泰语非标准词的形式及分类 37-42 4.2.1 泰语非标准词的形式 38 4.2.2 泰语非标准词的分类 38-42 4.3 泰语非标准词的归一化措施 42-46 4.3.1 非标准词的识别 42-43 4.3.2 非标准词的消歧 43-45 4.3.3 非标准词的生成 45-46 4.4 实验及结果略论 46-50 4.4.1 语料库 46-47 4.4.2 实验方案 47-48 4.4.3 实验结果及略论 48-50 4.5 本章小结 50-51 第五章泰语文本罗马化 51-65 5.1 罗马化 51-52 5.2 泰语罗马化 52 5.3 基于音节的泰语文本罗马化 52-61 5.3.1 泰语音节结构 52-55 5.3.2 元音-辅音搭配规则 55-56 5.3.3 元音-尾辅音搭配规则 56-59 5.3.4 特殊字符 59 5.3.5 声调 59-61 5.4 泰语罗马化实现 61-64 5.5 本章小结 64-65 第六章总结与展望 65-69 6.1 总结 65-67 6.2 展望 67-69 参考文献 69-73 致谢 73-75 参加的项目和的论文 75

，泰语论文范文

泰国华人社团史探讨	泰国特色商品专卖店	外国游客对泰国旅游业作用的实证探讨
泰国旅游有“三忌”	日轻在泰国建室内全铝空调器热交换器	泰国大学生汉语学习动机调查与略论
谨防“山寨”版泰国香米	泰国学生学习汉语的辅音偏误略论及教学	让心情好起来的五种食物
泰国商务部下调油棕果指导价	汉泰语“红”、“白”、“蓝”颜色词构	湖南中泰研发高强高模聚乙烯纤维
泰语熟语：欲速则不达	泰语词汇语法ติด“贴”，“上瘾”范	试析广西讲壮话的学生学习泰语的优势和