Font Size: a A A

Research On Multi-label Text Classification With External Information For Biomedical Domain

Posted on:2023-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:C C YeFull Text:PDF
GTID:2530307061453904Subject:Software engineering
Abstract/Summary:PDF Full Text Request
COVID-19 waves around the whole world and causes a huge threat to the life health of humans.The rapid developments in the biomedical domain can provide rich knowledge sup-port for the treatment of coronavirus disease.However,the amount of biomedical literature that contains life science knowledge has also increased exponentially.It is a great challenge to help researchers retrieve and screen useful literature quickly and accurately.MEDLINE databases often use multiple predefined labels to describe biomedical literature to quickly retrieve these,which can be seen as multi-label text classification.Manually labeling documents is very slow and costs a lot of manpower and resources.The rapid development of natural language process-ing technology and deep learning makes it possible to automate the text classification process.Aiming at multi-label text classification in the biomedical domain,this thesis analyzes the char-acteristics of biomedical literature and the shortcomings of existing methods and focuses on how to introduce external information.First,the existing methods only consider the textual information of the literature,but the metadata information of the literature is ignored.This thesis proposes a multi-label biomedical text classification model,MHG-MLTC,which is based on heterogeneous graph neural networks to introduce metadata information.The model utilizes heterogeneous graph neural networks to model the heterogeneity of metadata,thereby modeling the importance of different types of metadata.The experimental results show that the MHG-MLTC method has better classification performance than existing methods.The results of ablation experiments show that various types of metadata information can effectively improve the model’s classification performance.Second,existing methods usually only capture part of the label information,and cannot fully model the semantic information,hierarchical structure information,and statistical depen-dency information of labels.In this thesis,we propose a multi-label biomedical text classifica-tion model,MEI-MLTC,which introduces various external information.Based on the MHG-MLTC model,MEI-MLTC uses a label heterogeneous graph to model the semantic information of labels and the structural relationship and dependencies between labels.At the same time,to obtain better label-specific text representation,a label semantic-aware document representation layer is designed to introduce label semantic information.Experimental results show that the MEI-MLTC method outperforms existing methods,and ablation experiments show that both label relationships are effective in multi-label biomedical text classification tasks.
Keywords/Search Tags:Biomedical text, Multi-label text classification, Deep learning, Heterogeneous graph neural network
PDF Full Text Request
Related items