방송 뉴스에서의 한국어 텍스트 추출 및 인식 [韩语论文]-外语论文网

Text in Broadcasting News is very important since it provides substantial information regarding the content being broadcasted. It provides a compact form of the content sufficient to generate the metadata. Many research efforts on text extraction and recognition have been performed under the assumptions of structured text color, font style, contrast, stationarity, etc. In this thesis, a robust text extraction and recognition algorithm focused to broadcast news program is proposed and implemented. In order to improve the performance of text extraction, edgemap scheme is first proposed to localize the text region. Also, prior to text recognition procedure takes place, the text region is binarized by employing both dynamic local and global thresholds. Labeling and multi-frame operation are adopted to further reduce the noise in binarized region. Segmenting characters is achieved by vertical projection and containing them to proper height/width aspect ratio. For Korean text recognition, filter bank is applied for descriptor extraction and neural network classifier is applied to discriminate consonants from vowels. After recognition of each component, the results are mapped to various character constraining cavity models used specifically in broadcasting news. Experimental results demonstrate that the overall text extraction scheme proposed is reliable and promising for achieving substantially improved recognition performance compared to the conventional methods.

방송 뉴스에서 텍스트(자막)는 뉴스 내용과 관련된 정보를 제공하기 때문에 매우 중요하다. 뉴스에서의 텍스트는 메타데이터를 생성하기에 충분한 축약 형식의 내용을 제공한다. 그동안 텍...

방송 뉴스에서 텍스트(자막)는 뉴스 내용과 관련된 정보를 제공하기 때문에 매우 중요하다. 뉴스에서의 텍스트는 메타데이터를 생성하기에 충분한 축약 형식의 내용을 제공한다. 그동안 텍스트 추출 및 인식에 관한 연구가 텍스트의 정형화된 색, 폰트, 대조비, 일정 시간 유지 등의 가정하에 진행되어 왔다. 본 논문에서는 뉴스프로그램에 치중하여 강인한 텍스트 추출 및 인식 알고리즘을 제안하였고 시스템으로 구현하였다. 텍스트 추출 성능향상을 위해 우선 텍스트 영역을 지정하기위한 경계맵 생성 구조를 제안하였다. 또한 텍스트 인식 전단계로, 동적인 지역적, 전체적 문턱치 적용방법을 사용하여 영역을 이진화 하였다. 라벨링과 다중 프레임 병합 방법을 사용하여 이진영역의 노이즈를 감소시켰다. 그리고 수직 투영방법 및 높이/폭의 비율을 적용하여 글자 단위 분리 방법을 제안하였으며, 한글 텍스트 인식을 위해 필터뱅크를 적용한 특징 추출을 하였다. 신경망을 적용하여 분리된 자소를 인식하였고, 각 자소에 대한 인식후, 방송 자막에서 사용되는 제한적인 글자로 인식결과를 매핑하였다. 실험 결과는 제안된 전체 텍스트 추출 방법과 인식 방법이 기존 연구 방법의 성능보다 안정적이며 높은 성능을 보여준다.

참고문헌 (Reference)

활용도 분석

View

Usage

이 자료의 주제 내 활용도 Top
이 자료의 주제 내 View Top
이 자료의 주제 내 Usage Top
이 자료의 주제 내 Share Top

※ 각 수치는 매주 업데이트됨

，免费韩语论文，韩语论文

도시지역 여성결혼이민자의 재사회화	영어 문장구조에 대한 이해가 읽기와 듣	韩国电影剧本中会话含义的略论探讨
高职院校韩语系建设的几点思考	중국인 학습자를 위한 한국어 거절 화행	깔뱅의 기도론 연구
항공사의 지각된 서비스품질이 실용적	모야모야 환아의 수술 후 자기효능감,	韩国跆拳道运动的文化价值观探讨
한·중 사동 표현의 대조 연구	TV 포맷의 새로운 유형화 : 이야기, 놀이	汉韩常用颜色词对比探讨
영어권 학습자를 위한 한국어 교재 구성	한국과 독일의 중등교육단계에서의 진로	형태 초점 접근법을 활용한 한국어 대조