生成问答系统的法语文本处理模型[法语论文]

资料分类免费法语论文 责任编辑:黄豆豆更新时间:2017-05-08
提示:本资料为网络收集免费论文,存在不完整性。建议下载本站其它完整的收费论文。使用可通过查重系统的论文,才是您毕业的保障。
本论文的研究范畴是天然说话的主动处置(TALN),目标在于树立起一个可以或许对法语文章停止主动剖析和生成的处置模子。该模子的构想树立在器械合璧的实际基本之上,自创了东方的语义收集、语义基元和各类常识表达系统和我国上个世纪90年月末中科院声学研究所黄曾阳师长教师等人提出的句类剖析实际和五元组思惟(HNC,概念条理收集实际)。我们要树立的模子不以直接的贸易应用为目标,而旨在为法语的主动处置生成供给一个新的研究思绪。为了详细直不雅地阐释该实际形式,我们拔取了一篇法语童话作为应用例文。固然,该模子仅能在一篇简略例文的基本上生成一段小型的人机对话,在对话范围和选文难度上都缺乏为道。但从别的一个角度来讲,如许的处置形式可以进一步应用到更加辽阔的范畴,如智能检索、信息过滤和信息萃取等。由于这些热点的应用课题都可以依附如许一个配合的实际基本,即对文章的主动懂得:以动词语义块为焦点树立起句类格局,从而供给一层次解句子的门路。该实际的立异的地方重要在于五元组的概念表述办法。简略而言,五元组是一组用来表达笼统概念特征的五个元素:静态(v)、静态(g)、属性(u)、值(z)和效应(r)。跟现存的其他语义分类比拟,五元组可以更好地保留概念问的自然接洽,有益于联想头绪的树立。如斯一来,传统的语法划分(动词、名词、描述词等)被语义语用性质的划分措施所替换,符号化的表述体系很好地逢迎了机械编程的需求。除概念表述之外,该实际在句子层面也有一个弗成疏忽的立异点,那就是句类剖析实际。斟酌机械处置的特征,HNC用语义层面的句类划分替换了传统的语法分类。以动词语义块为中间树立起句类和对应的句类格局,依据事后树立的剖断前提和语义块标记,从而可以获得其他语义块的信息和处置全部句子。并且经由过程事后设立的全部符号系统,处置的句子可以转换成固定的符号表达式,是以便利了机械的认知和处置。正如其别人机对话模子,我们要树立的处置模子异样包含根本的两年夜步调:成绩处置和谜底生成。与文本的处置法式相似,成绩经由剖析处置终究会转化成异性质的符号表达式。那末在兼容的表述形式的支撑下,成绩和文本之间即可以停止搜刮和婚配。须要解释的是如许的婚配法式不只仅局限于找出雷同的词或句,而是能在懂得的基本上近似盘算出语义间隔,从而表现了智能处置的优势。就今朝的研究阶段,谜底的生成还不是最重要的研究目标。绝关于成绩针对的信息,只需谜底供给的信息是成心义的,那末这个谜底即可以被承认,暂不作体裁方面的斟酌。也就是说,更高条理的说话请求(说话条理、修辞等)都在不列入斟酌领域。另外,针对天然说话处置中弗成防止的成绩,例如断句分词(segmentation)、替换景象(anaphore)、语义消歧(desambiguisation)和语义注解(annotation)等,我们也会做出必定的剖析和解释:剖析该成绩的本质和现状;解释我们在现阶段所采用的处置办法。总之,这篇论文任务要树立的模子只是试验性质的,旨在为法语文本的主动处置供给一条新的思绪,而不妄图能即刻投入现实的应用范畴。愿望如许的新思绪能为其他的研究者供给无益的参考。在小我研究的前提下,我们只能设立较近的任务目的,并且只能树立无限的词库、处置简略的句子和文本。但如果能有团队协作的支撑,我们便可以或许树立更年夜的词库、预设更庞杂的句式,从而将该处置形式应用到其他范畴,例如智能信息搜刮、敏感信息过滤、长途教授教养等。任何年夜容量词库的扶植、年夜规模句类的研究和编程的详细完成都须要年夜力的人力、财力和技巧保证能力完成。

Abstract:

The research area of this paper is to take the initiative to dispose of natural speech (TALN). The goal is to set up a can of the French initiative to stop analysis and generation of models. The conception of the model set up on the actual basic equipment is amalgamative, inventing the Oriental collecting semantics, semantic primitives and all kinds of knowledge expression system and our country last century 90 years at the end of the Chinese Academy of Sciences acoustic research Huang Mr. et al proposed sentence category analysis of actual and five yuan group thought (HNC, concept of organized collection of actual). We should set up the model by using the direct trade as the goal, and to take the initiative to dispose of supply a new generation of French research ideas. In order to intuitively explains the actual form, we use a French fairy tale as the example. Of course, this model can only in a simple nave basically generates a small man-machine dialogue, in the dialogue and text difficulty lack for the road. But from another perspective, this form can be further applied to the disposal of more vast areas, such as intelligent retrieval, information filtering and information extraction. Because of these hot spots using the subject can be attached to such a with the actual, which means that we of the initiative to understand: to semantic block focus set up sentence pattern, to supply a hierarchical solution sentence opportunities. The practical innovation of important lies in the concept of five tuple representation method. In brief, the five tuple is a group of five elements used to express the general concept of static (V) and static (g), (U), attribute value (z) and effect (R). Compared with other existing semantic classification, five tuple can better keep the natural concept of Q is beneficial to establish contact, Lenovo clue. In this way, the traditional grammar partitioning (verbs, nouns, adjectives, etc.) is replaced by the classification method of semantic and pragmatic properties, symbolic representation system well catered to the demand of mechanical programming. In addition to the concept, the practice also has a put into neglect innovation at the sentence level, that is the actual sentence category analysis. Consider the characteristics of mechanical processing, HNC has replaced the traditional grammar with semantic classification to classify sentences. The semantic block is set up and the corresponding intermediate sentence pattern of sentence type, according to the post set split off premise and semantic chunk marker, which can obtain information and the disposal of other semantic blocks all sentences. And through the process set up after all the symbol system, disposal sentence can be converted into symbolic expressions is fixed, in order to facilitate the recognition and disposal of machinery. Don't as the man-machine dialogue model, we should set up the models for strange include fundamental at the two steps: performance disposal and answer generation. And the text processing procedures similar results through the analysis processing will eventually transformed into symbolic expressions of nature. Support in the forms of the compatibility between performance and text, which can stop the search and match. Need to explain is such mating French not only to find the same words or sentences, but in understanding basic on the approximate calculation the semantic gap. It also showed the advantages of intelligent disposal. The research stage of current generation, the answer is not the most important target. Relative to the results for information, just answer the supply of information is mean of righteousness, that at the end of this answer to a riddle that can be admitted, temporarily genre of discretion. That is to say, the higher level of request to speak (talk and rhetoric level are not included in the appropriate field). Also for natural language disposal in Eph into preventing performance, such as punctuation word segmentation, replace disambiguation (desambiguisation) and semantic annotation annotation semantic and scene (anaphore). We will also make necessary analysis and explanation: analysis of the nature and status of the results; interpretation we present the solution. In a word, this thesis task to establish model only experimental to French text active disposal supply a new train of thought, and not an attempt to immediately put into practical use. New thoughts such desire for other researchers to supply useful reference. Under the premise of individual research, we can only set up closer to the task objective and can only be set infinite thesaurus, disposal simple sentences and text. But if you have the support of the team, we can perhaps establish more of the eve of the thesaurus, presupposition more complex sentence to the disposal of the form to apply for other categories, such as intelligent information search, sensitive information filtering, long-distance teaching. Any large volume of thesaurus construction, scale of the eve of the sentence category of research and programming with complete require large force of manpower, financial resources and skills ensure its ability to complete.

目录:

免费论文题目: