A named entity recognition method is used to improve the performance of information retrieval systems, question answering systems, machine translation systems and so on. The targets of the named entity recognition are usually PLOs(persons, locations a...
A named entity recognition method is used to improve the performance of information retrieval systems, question answering systems, machine translation systems and so on. The targets of the named entity recognition are usually PLOs(persons, locations and organizations). They are usually proper nouns or unregistered words, and traditional named entity recognizers use these characteristics to find out named entity candidates. The titles of books, movies and TV programs have different characteristics than PLO entities. They are sometimes multiple phrases, one sentence, or special characters. This makes it difficult to find the boundary of the named entity candidates.
In this we propose a method to extract title named entities from news articles and automatically build a named entity dictionary for the titles. For the candidates identification, the word phrases enclosed with special symbols in a sentence are firstly extracted, and then verified by the SVM with using feature words and their distances. For the classification of the extracted title candidates, SVM is used with the mutual information of word contexts.
The experiment was done on 19K news articles with 90% for learning data and 10% for testing data. The evaluation was done with 200 sentences randomly selected from the testing data. The performance of title identification is 81.17% in F1-score and that of title classification is 92.92% in each module. And the performance of the integrated module is 81.09% in F1-score. The dictionary construction performance, which is measured by deleting the duplicate extracted titles, is 71.01% in F1-score.
,免费韩语论文,韩语论文 |