Word Embedding 자질을 이용한 한국어 개체명 인식 [韩语论文]-外语论文网

개체명 인식은 문장에서 인명, 지역명, 조직명과 같은 개체명을 인식하는 작업이다. 한국어 개체명 인식에 다양한 연구가 있었지만, 영어 개체명 인식에 비해 자질이 부족한 문제를 가지고 ...

개체명 인식은 문장에서 인명, 지역명, 조직명과 같은 개체명을 인식하는 작업이다. 한국어 개체명 인식에 다양한 연구가 있었지만, 영어 개체명 인식에 비해 자질이 부족한 문제를 가지고 있다. 본 논문에서는 한국어 개체명 인식에 word embedding을 자질로써 사용하는 방법을 제안한다. 형태소 분석 및 품사 부착 말뭉치로부터 CBOW(Continuous-Bag-of-Words) 모델을 이용하여 word vector를 생성하고, word vector로부터 K-means 알고리즘을 이용하여 군집 정보를 생성한다. Word vector와 군집 정보를 word embedding 자질로써 CRFs(Conditional Random Fields)에 사용한다. 실험 결과 TV 도메인, Sports 도메인 그리고 IT 도메인에서 기본 시스템 성능보다 각각 0.52%, 0.53%, 0.82% 성능이 향상되었다. 또한 제안 방법이 다른 한국어 개체명 인식 시스템보다 성능이 향상되는 것을 보여 제안 방법의 효용성을 입증했다.

Named Entity Recognition (R) is the task to recognize and classify named entities such as person name, location, and organization. There were various studies on Korean Named Entity Recognition, but those have some problems, for example lacking features as compared to English R. In this , we propose a method that uses word embedding as features for Korean R. We generate word vector using Continuous-Bag-of-Words(CBOW) model from POS tagged corpus, and word cluster symbol using K-means algorithm from word vector. We use word vector and word cluster symbol as word embedding features in Conditional Random Fields(CRFs). From the result of experiment, performance improves 0.52%, 0.53% and 0.82% respectively in TV domain, Sports domain and IT domain over the baseline system. Showing better performance than other R systems, we demonstrate effectiveness and efficiency of the proposed method.

，韩语毕业论文，韩语论文范文

韩国电影剧本中会话含义的略论探讨	도시지역 여성결혼이민자의 재사회화	高职院校韩语系建设的几点思考
영어권 학습자를 위한 한국어 교재 구성	형태 초점 접근법을 활용한 한국어 대조	汉韩常用颜色词对比探讨
중국인 학습자를 위한 한국어 거절 화행	모야모야 환아의 수술 후 자기효능감,	한국과 독일의 중등교육단계에서의 진로
한·중 사동 표현의 대조 연구	영어 문장구조에 대한 이해가 읽기와 듣	항공사의 지각된 서비스품질이 실용적
깔뱅의 기도론 연구	TV 포맷의 새로운 유형화 : 이야기, 놀이	韩国跆拳道运动的文化价值观探讨