Research on the historical data corpus has a long history. Although the researchers of the history of Mandarin have mastered a lot of digital Mandarin historical data, they can’t make good use of them through digital operations of computer. At pre... Research on the historical data corpus has a long history. Although the researchers of the history of Mandarin have mastered a lot of digital Mandarin historical data, they can’t make good use of them through digital operations of computer. At present, there are many historical data corpora, including Sejong historical data corpus, collected by institutions and individuals. But when compared with the constructions of corpora and researches on other fields, the R&D work of utilizing these corpora efficiently has obvious deficiencies. This research’s purpose is to research and develop the historical data of the original corpus analysis tool. The R&D of historical data lexical analyzer is not only beneficial for obtaining the vocabulary data which used to research the historical data vocabulary quickly, but can also cut down the expense. Meanwhile, it benefits the compilation and R&D of Korean history dictionary. To achieve the research purposes above, this research takes the printed ancient novels corpus with almost 1.6 million basic rhythmic units and the analysis result (formal analysis) of it as first data to compile the dictionary, which is the basis of the R&D of vocabulary analysis. The first chapter mainly focuses on the scale of the constructed historical data corpus and its usage situation. The second chapter focuses on the existent basic approaches of tagging and morphological analysis and the research situation of historical data. And there will be an introduction of these historical data used in this research. The third chapter mainly introduces the construction and method of dictionary used for vocabulary analysis. Dictionary is mainly composed of language dictionary, grammar dictionary, Stem dictionary and appellations dictionary, and the dictionaries can be updated. These dictionaries can be used to deal with unknown words, and the part that can’t be dealt with can be improved accurately by perfecting dictionary. The forth chapter is based on Hidden Markov Model and explains how to eliminate the vocabularies’ lexical ambiguity by Viterbi algorithm. In the process of eliminating the lexical ambiguity, the stochastic model has been constructed. When the frequentness is zero, smoothing is used to reduce the effect on the result. The fifth chapter discusses the construction and usage of historical data lexical analyzer system. The sixth chapter mainly introduces the result of lexical analyzer’s analyze. ,韩语论文,韩语毕业论文 |