With the development of biomedicine and internet technology,digital textual resources such as related materials,documents and data in the biomedical field have grown rapidly in recent years.Massive biomedical resources contain various and cutting-edge biomedical knowledge,which is an important source of knowledge for relevant practitioners.To quickly and accurately excavate and discover the specific knowledge that is really needed from such a huge digital biomedical text resource,it is necessary to rely on intelligent and effective technical tools to deal with the challenges of the information explosion era.Therefore,this thesis focuses on the key methods of biomedical text mining,and uses deep learning technology to carry out in-depth research from the four tasks of biomedical text classification,named entity recognition,relation extraction and trigger extraction.For the task of biomedical text classification,in view of the fact that the existing neural network model fails to fully and effectively introduce and use domain knowledge,a hierarchical domain knowledge-aware method for text classification is proposed.This method introduces complete entity information and standardized conceptual information,and designs a hierarchical knowledge induction attention module,which enhances the information interaction between domain knowledge and original input from both the word-level and sentence-level.Multiple evaluation results on the protein-protein interaction article classification data set show that our model can make more effective use of domain knowledge,and reach the advanced performance.For the task of biomedical named entity recognition,in view of the insufficient annotated training samples in a specific field and the inability of monotonous word vectors to fully express the implicit semantic features of the input sequence,a named entity recognition method based on multi-task learning and various features is proposed.This method uses multiple combinations of word vectors,character vectors and contextualized vectors to propose a multi-channel neural network,and designs a feature integration method to achieve adaptive fusion of the output representation of each channel,and at the same time introduces the auxiliary corpora with same entity types based on the multi-task learning strategy for collaborative training.Experiments show that multi-task learning strategy and multi-channel mechanism can significantly improve the performance.For the task of biomedical relation extraction,a relation extraction method based on dual piecewise attention neural tensor network is proposed to solve the problem of dense distribution of entity pairs in complex long sentences and unbalanced sample categories of clinical data.This method is based on the tensor weight matrix to improve the coding method of the neural networks.At the same time,dual piecewise attention module is proposed to enhance the model’s feature extraction ability for complex long sentences,and a weight-adaptive cross-entropy loss function is designed.The experimental results show that this method achieves advanced performance on the clinical relation extraction data set.For the task of biomedical trigger extraction,in view of the insufficient features of current shallow neural networks and the ambiguity and polysemy of triggers,a trigger extraction method based on multi-layer residual gated neural network is proposed.By taking into account the label dependence of the trigger phrases,this method treats the trigger word extraction as a sequence labeling task,designs the residual Bi LSTM-CRF architecture with the gated mechanism to iteratively capture and extract features,and introduces ELMo and Bi LSTM to respectively compute the contextual representations and the internal character representations of triggers.Experiments show that this method has achieved competitive performance on the biomedical trigger extraction data set. |