Drug-Drug Interaction Extraction(DDIE)is a task to extract Drug-Drug Interaction(DDI)from biomedical literature using text processing technology,which could ensure real-time updates and high coverage of drug databases.We take biomedical texts as the research object,and mainly carries out the following research work for DDIE tasks:(1)The existing DDIE methods mainly rely on external drug information to achieve better extraction performance,but the collection and utilization of external information consumes additional time and computing resources.In this thesis,we propose a novel method based on Key Semantic Sentence(KSS)and Gradient Harmonizing Mechanism(GHM)to extract DDIs.We not only reduce the cost of time and computational resources,but also achieve comparable performance to methods with external drug information.First,we found two reasons for the performance degradation of extracting DDIs: the mismatch between the drug entity pairs and the DDI relation words,and the label-noise introduced by the mislabeled instances.We emphasize the drug entity pair by retaining the drug name and adding the drug entity marking,and delete the mismatched DDI relation words by using the KSS.After the above steps,the mismatch problem can be significantly reduced.Then,we employ the GHM Loss to reduce the weight of mislabeled instances to alleviate the label-noise problem.The proposed model achieves state-of-the-art results with an F1-score of 84.13% on the DDIExtraction 2013 corpus,which fills the performance gap(4%)between methods that rely on and do not rely on external drug information.(2)Existing DDIE methods are all based on supervised learning,which is easily affected by the quantity and quality of annotated instances in the training set.We propose a twostage semi-supervised DDIE method based on consistency training to extract DDIs.The method uses many unlabeled instances to assist in training the DDIE model by consistent training,and therefore reducing the model’s demand for labeled instances.Further,we propose a two-stage DDIE method combining drug knowledge,to alleviate the imbalance of positive and negative instances which causes the positive information in unlabeled instances to be submerged and cannot be effectively utilized.By actively changing the proportion of annotated instances in the DDIExtraction 2013 corpus,the proposed method achieves 2.35 times the performance comparing the supervised model in extracting DDIs at a 10% labeling ratio,and surpasses the performance of the supervised model with fullylabeled instances at a 40% labeling ratio,achieve an F1 value of 80.21%.(3)We also designed a prototype system to extract DDIs from biomedical literature,which could recognize the drug entity and extract the DDIs.The system can assist experts in the biomedical field to quickly and accurately extract valuable DDI information from biomedical literature,and so as to ensure that the drug database is updated in a timely and effective manner.When annotated instances are sufficient,our proposed supervised DDIE method can efficiently extract DDIs from biomedical texts without relying on external drug information;When annotated instances are insufficient,our proposed semi-supervised DDIE method can also effectively leveraging unlabeled instances and drug knowledge to achieve a good DDI extraction performance. |