Similarly to the past structuralism, up to now computational models for Korean morphology have been linear in that it deals with only segmentation or concatenation of morphemes rather than formation of the internal structure of a word. When integratin...
Similarly to the past structuralism, up to now computational models for Korean morphology have been linear in that it deals with only segmentation or concatenation of morphemes rather than formation of the internal structure of a word. When integrating these linear models with syntax analysis, it requires an additional interface component between the morphological component and the syntactic component to bind morphemes into sentence constituents. Furthermore, the linear model is not semantically intuitive. Based on the word-syntactical point of view, this thesis proposes an integrated computational model that deals with morpheme segmentation, formation of syntactic element (sentence constituent), and even internal structure of word. The computational model presented in this thesis need not be equipped with additional components (e.g., interface component) caused to be inefficient. Moreover, it is invisible or partially visible to sentence-syntactic parser. In other words, the sentence-syntactic parser need not access to individual morphemes, accesses only maximal units (i.e., in a sentence-syntactic point of view, minimal, grammatical, or atomic units) formed through combining morphemes together by feature constraints and operations. The final goal of this thesis is to design and develop an algorithm that is computationally efficient enough to apply it to natural language applications, and is able to be utilized as a simulation tool in the area of linguistics.
The morphological component that the proposed algorithm for the syntactic structure analysis of word is based on consists of three elementary features, the list of morphemes permitting feature structures, the rules to deal with morphological alternation, and the rules for word formation. The list of morphemes is conserved by the syllable in a single dictionary organized as the TRIE data structure, and the dictionary provides with some linguistic information and plays the role of segmentation of morphemes. Spelling rules to deal with morpheme segmentation and morphological alternation problems are based on the two-level formalism. Differently from early two-level formalism, they are not concerned with the estimation of the legitimacy of connection between morphemes, but carry out only restoration of morphologically alternated surface form to lexical form. In addition, functional diacritics are proposed to incorporate categorial context into the two-level formalism, and two sorts of reciprocity constraints are proposed to catch the dependency between two-level rules at finite-state transducer, which is the low-level representation to implement its rules. The proposed computational model uses context-free grammar as word formation rules to investigate hierarchical relations between morphemes and lexical phrases, and uses the method based on GLR algorithm as its parsing algorithm. Since the GLR parsing algorithm has typically been used for syntactic analysis of sentence having explicit boundary of units (words), it must be revised to be applied to the analysis of internal structure for word having multiple interpretations for boundary of units (morphemes). To achieve this purpose, this thesis presents the algorithm executed interleavingly, where at each step, by a breath-first search, a syllable fully traverses the nodes in multiple paths of TRIE dictionary and passes through the all vertices connected by matched edge paths in finite-state transducer, and then makes up its own graph-structured stack from their found interpretations.
The morphological point of view throughout this thesis is the weak lexicalist hypothesis which says that lexical transformations in generative grammar cannot be used in derivational morphology, conversely, only derivation takes place in morphological component, and inflection, as opposed to derivation, takes place in sentence-syntax. However, whether the weak lexicalist, the strong lexicalist, or even linear one, the proposed model is executed according to the word formation rules that user defines, and therefore the proposed model has a rule-parser for word formation rules.
The proposed model does not have mechanism for feature structures and logical operations on feature structures. However, to show that it is able to be migrated toward the feature-based approach by only simple modification of algorithm, this thesis presents how to modify it and two sorts of samples about phrase-structured approach and feature-based approach. Finally, in order to prove the efficiency of the proposed model, this thesis presents the evaluation results of elapsed time in analysis for about 966,000 words.
과거의 구조주의와 유사하게 한국어 형태론에 관한 기존의 전산 모형은 선형적인 것들로 단어 내부구조 형성보다 형태소의 분리 문제나 연쇄에만 관심을 두고 있다. 이러한 선형적 전산모...
|