한국어 정보검색에서 외래어와 영어로 인한 단어불일치문제의 해결 [韩语论文]-外语论文网

In Korean text, these days, the use of English words with or without phonetic translations is growing at a high speed. To make matters worse, the Korean transliteration of an English word may vary greatly. The mixed use of English word and their various Korean transliterations in the same document or document collection may cause severe word mismatch problem in Korean information retrieval. When user query and document text use different transliterations from each other, simple word matching is unable to retrieve the document. When a user query uses Korean transliteration and document contains English word or vice versa, simple word matching also fails.
In order to resolve the word mismatch problem, it is necessary to find equivalence classes among English words and their various Korean transliterations. However constructing the equivalence classes is not easy due to the inherent difficulties of the problem. There are two possible approaches to tackle the problem. One approach is to transform, i.e. back-transliterate, foreign words into their origin English words and use English words as canonical forms for indexing and querying. The other approach, which is proposed in this thesis, is to transliterate English words into Korean and construct equivalence classes among foreign words by measuring the phonetic similarities among them. We call the former back-transliteration approach and the latter transliteration approach.
The back-transliteration approach appears to be more convincing since the original English word is unique whereas its Korean equivalent can be transliterated in multiple ways. However the back-transliteration approach has more difficulties in its actual implementation than the transliteration approach. This is based on the following three observations: (1) back-transliteration is inherently more difficult than transliteration, (2) In Korean text there are generally much more foreign words than English words, (3) English multi-word problem is much more difficult to be handled in the back-transliteration approach than in the transliteration approach. Based on these three observations, we argue that our proposed transliteration approach is more advantageous for the resolution of the word mismatch problem than the previously proposed back-transliteration approach. Our information retrieval experiment results supported the argument.
The actual implementation of both the transliteration approach and the back-transliteration approach is not easy at all since they require very good solutions for the following more or less unsolved problems: foreign word extraction, automatic transliteration and back-transliteration, and phonetic similarity comparison between foreign words. Low performance in one of the processing modules would greatly degrade the final accuracy of the equivalence class construction. In this thesis we proposed an effective solution for each of the task of foreign word extraction, automatic Korean-English transliteration and back-transliteration, Korean phonetic similarity comparison, and Korean-English character alignment. The automatic character alignment is inevitable for the automatic generation of the training examples for the automatic transliteration and back-transliteration. Our character alignment algorithm was highly accurate but the solutions for the other tasks were not good enough. Hence the equivalence class generated turned out to be too poor for the practical application. We concluded that for the practical use in Korean information retrieval more effective solutions must be sought for the foreign word extraction, automatic transliteration and back-transliteration, and Korean phonetic similarity comparison. In current situation, in order not to harm the information retrieval performance, a realistic approach is to make more conservative decision whether a word belongs to an equivalence class.

，免费韩语论文，韩语论文范文

韩国电影剧本中会话含义的略论探讨	高职院校韩语系建设的几点思考	도시지역 여성결혼이민자의 재사회화
모야모야 환아의 수술 후 자기효능감,	한국과 독일의 중등교육단계에서의 진로	韩国跆拳道运动的文化价值观探讨
중국인 학습자를 위한 한국어 거절 화행	영어권 학습자를 위한 한국어 교재 구성	형태 초점 접근법을 활용한 한국어 대조
汉韩常用颜色词对比探讨	TV 포맷의 새로운 유형화 : 이야기, 놀이	깔뱅의 기도론 연구
한·중 사동 표현의 대조 연구	영어 문장구조에 대한 이해가 읽기와 듣	항공사의 지각된 서비스품질이 실용적