| Biological event extraction presents the potential fine-grained and complex relationships among biomolecules in a structured form from huge amounts of biomedical literature.It is widely used in the field of systems biology,which provides an important basis of disease diagnosis,prevention,treatment,new drug development and life science research.A complete biological event consists of the biological event trigger that triggers the occurrence of an event and its participants(argument).On the one hand,the biological event category is determined by the trigger type of the event,and the results of trigger identification affect the performance of argument detection directly.Thus,trigger identification is the core task of biological event extraction.On the other hand,argument detection is used to identify the participants of the event so as to constitute a complete biological event,which is of great significance for the generation of events.Therefore,this thesis focuses on the main issues of biological event extraction,including trigger identification and argument detection.The main contents are as follows:For the trigger identification based on statistical machine learning methods,an identification method based on two-stage and feature selection is proposed.In this method,the trigger identification is divided into two stages.In the first stage,whether the current word is a trigger should be judged;In the second stage,the predicted triggers are classified into specific types.The two-stage method decomposes the complex classification problem into two simple sub-problems,and reduces the difficulties of the issue research.Also,class imblance problem on the corpus can be alleviated and the performance of trigger identification is improved.In addition,different features are selected for different stages by feature selection algorithm,which further improves the classification performance in each stage.This method obtains good performance on multiple biological event extraction corpora.To reduce the artificial cost and further improve the performance of trigger identification,this thesis explores effective trigger identification methods based on optimizing data representation and deep learning.Therefore,a bidirectional LSTM trigger detection model(Sentence Embeddings and Attention Based BLSTM Neural Network,SE-Att-BLSTM)based on the sentence vector and word level attention mechanism is proposed.Sentence vectors can capture the sentence level features and the information related to events within the sentence,which improves the performance of trigger identification.The word level attention enhances the key information in the sentence,which reduces the loss of important context information in processing long sentences using LSTM.In addition,the two-stage method is combined with SE-Att-BLSTM model to identify event triggers,which achieves the state-of-the-art performance on the commonly used corpus of biological event extraction.According to the complexity of the event structure,biological events can be divided into simple biological events and complex biological events.There are structural differences between simple biological events and complex biological events.However,the existing methods deal with them uniformly and lack of consideration about interaction among arguments,also the performance of complex event extraction is low.Therefore,this thesis makes a fine-grained distinction of the arguments in simple biological events and complex biological events.At the same time,"relevant argument" is defined which is based on the structural characteristics of complex event arguments.The argument detection model(Multi-level Attention Based BLSTM Neural Network,Mul-Att-BLSTM)based on bidirectional LSTM and multi-level attention mechanism is proposed.The word level attention strengthens the crucial information within the argument candidate.The sentence level attention enhances the influence among relevant arguments.Finally,the complete biological events are constituted by the post-processing based on machine learning method.The proposed model achieves good performance on the commonly used corpus of biological event extraction,especially further improves the performance of complex biological event extraction.In summary,aiming at the problems of trigger identification and argument detection,the statistical machine learning methods and LSTM neural network are employed.At the same time,the two-stage method is integrated,and the sentence vectors are constructed.Furthermore,the multi-level attention is proposed to enhance the mutual influence among relevant arguments.In addition,the fine-grained argument types of the simple and complex events are distinguished.This thesis achieves the state-of-the-art performance of biological trigger identification and event extraction.However,there is still a wide broad prospect which worth being further studied in the future work. |