Automatic Extraction of Protein范文 [英语论文]-外语论文网

Protein-protein interaction extraction is the key precondition of the construction of protein knowledge network, and it is very important for the research in the biomedicine. This extracted directional protein-protein interaction from the biological text, using the SVM-based method. Experiments were evaluated on the LLL05 corpus with good results. The results show that dependency features are import for the protein-protein interaction extraction and features related to the interaction word are effective for the interaction direction judgment. At last, we analyzed the effects of different features and planed for the next step.

Keywords: Support vector machines(SVM), Bio-Entity Relation, Protein-Protein Interaction, Entity Relation Direction.

1 Introduction

With the rapid development of life sciences, the biomedicine literature has been rising very fast. At present, the technology of information extraction has become mature already. As a result, the research in biomedicine information extraction is becoming more and more important, and relation extraction is one of the most important. Not only is it practical by itself, but it is also the foundation of the relation database and the biological knowledge network, besides it also plays a key role in the relation prediction and the drug producing. Now, the relation extraction has already become a hotspot, but there exists some problems, too. For instance, the result is not good enough, and some important information such as direction and type is ignored.
This did research from two aspects: improving the result and exacting more information about relation, direction for example. Towards the characters of biomedicine literature, we designed some new features, and extracted relation with the good machine learning model SVM, and the experiments showed that the results were good.

2 Extracting Protein-Protein Interaction

Once protein names have been found, the relationships between them need to be ascertained. The PPI extraction could be defined as a classification problem. When two protein names and one interaction word co-occur in a single sentence, then we could transfer the mission into inferring weather a PPI exist between the pair of proteins. So, firstly, the sentences were filtered by the simple rule that two protein names co-occur in one sentence. Secondly, we used a trained SVM model to solve this classification problem.
After relation extraction, we decided direction of the relation, because the direction is important to construct a biological network. We also transformed this problem into classification.

3 Results

SVM model was trained on the standard corpus LLL05 corpus(J. Hakenberg, et al., 2017) and the effective features (word features, POS features, logic features and dependency parsing features). In this experiment, we get 38,504 proteins and 51,568 PPIs between them through the SVM-based method.
The SVM-based medel trained on the LLL05 corpus achieves a good preferment of 82.4% precision, 73.7% Recall and 77.8% F-score. The experiments on LLL05 corpus showed that the F value was as high as 80% and the new features had improved the results a lot. In conclusion, the syntactic features had improved both the precision and the recall while the logic features had improved the recall. What’s more, the syntactic features could make a good result even by itself.

result of protein-protein interaction experiment

feature

Word / POS

/ Logic

/ Syntax

/ Logic / Syntax

precision

81.82

75.00

91.67

82.35

recall

47．37

47.37

57.89

73.68

F value

60．00

58.06

70.97

77.78

result of direction judgement experiment

feature

measure

Phisical / Clause

Subtree / Clause

Phisical / Subtree / Clause

direction

inverse

direction

inverse

direction

inverse

precision

83.33

100.00

80.00

83.33

100.00

recall

100.00

80.00

100.00

80.00

F value

90.91

88.89

80.00

90.91

88.89

高中英语“互动教学模式”的构建方式	小学英语阅读教学的理性突围的有效性	略论广告英语模糊性及语用准则
略论大学英语教学在网络环境下的自主学	中学生对英语课堂教学副语言反应的性别	phonics让孩子们在英语里展翅翱
小学英语报刊阅读教学的尝试	论高职院校公共英语教学	从跨文化交际角度看食品广告的翻译
《灵活应用多媒体,提高学生英语自主学习	《高职英语教学现实存在的问题及改进意	《解析被动句的语法特性》
通过中学英语课堂教学，培养学生创新思	浅析英汉广告语言的文化内涵	How is Absolute Poverty Line Measured范文