| Biomedical literatures with exponential growth and massive social texts contain valuable biomedical knowledge for life science and are rich resources for biomedical study.Therefore,it is urgent to explore such an effective technology as text mining to automatically extract accurate information from biomedical texts.However,to obtain more effective information from such a large number of texts,we need to reduce the number of text categories,filter out the invalid text,and obtain the required information from the target text.Methods in every stage are essential to the performance of mining knowledge.Therefore,this dissertation first focuses on semantic textual similarity evaluation as well as text classification and takes biomedical relation extraction as the final goal.Moreover,according to the characteristics of biomedical literature and social text,the dissertation deeply analyzes the shortcomings of existing methods in the three tasks and is centered around semantic-enhanced using semantic interaction and knowledge to further study.Firstly,methods in the general domain are not suitable for biomedical literature texts with long and syntactically complex sentences,which results in the long-range dependency problem.Moreover,existing methods ignore the importance of mutual semantic between sentences.An interactive self-attention mechanism is proposed.The attention operation can enhance semantic via the semantic interaction between two sentences in a sentence pair and alleviate the long-range dependency problem.It enlarges the semantic differences of dissimilar sentence pairs and reduces the semantic differences of similar sentence pairs,improving the performance of semantic textual similarity estimation.Furthermore,aiming to possible semantic loss induced by the average of vectors in interactive attention and context-independent word embedding,a cross self-attention(CSA)mechanism is proposed,and context-related word embedding generated by the pre-trained model is employed to overcome the shortcomings of traditional word embedding.The experimental results indicate that better semantic-enhanced information is achieved by CSA,and context-related semantic representation promotes the performance.Secondly,the existing methods have good performance in biomedical literature text classification,but these methods do not perform well for the social text owing to its sparsity and insufficient semantic representation.To tackle the above-mentioned issue and the insufficiency of emotional expression in social texts,an effective hybrid model is proposed for detecting ADR(adverse drug reaction)tweets.It builds the drug-ADR co-occurrence pair base via a related knowledge base and a medical website and extracts the co-occurrence sub sentences from tweets.Then,the interaction between the co-occurrence sub sentence and original text enhances semantic expression of the social short text and complements insufficient semantic expression of the social short text.The pre-trained model is done using a large-scale emotion analysis corpus and can generate sentence-level emotional context information,which is combined with the emotional word score to express more sufficient emotion.Experimental results show that the proposed model improves the performance of ADR classification by enriching emotion representation and semantic-enhanced of short text.Finally,for document-level relation extraction,the existing methods ignore i)the semantic interaction among abstract,title,shortest dependency path and knowledge representation,ii)the different contribution of different sentences in the document to the semantic representation of the whole document,and iii)the target entity semantic information collecting from the whole document.Hence,a document level R-BERT based on semantic-enhanced information and knowledge representation is proposed.This method uses CSA to achieve mutual semantic information,enhancing the semantic representation of title and abstract and capturing more sufficient semantic representation of the whole document.Meanwhile,the Gaussian probability distribution is introduced to compute the weights of the co-occurrence sentence and its adjacent entity sentences,so that the model can not only learn the local semantic information of co-occurrence sentences but also learn the global semantic information of the whole article,which helps collect more effective semantic information.Additionally,document level R-BERT is presented to collect the target entity semantics from the whole document,which can represent more complete target entity semantics.The experimental results show that the method achieves state-of-the-art performance of document-level relation extraction. |