Font Size: a A A

Research On Text Classification And Event Detection For Biomedicine

Posted on:2022-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiFull Text:PDF
GTID:2480306509984609Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As people pay more attention to health care,the biomedicine has been developing rapidly,and biomedical electronic literature has been focused as one of the most important resources.The amount of information is increasing exponentially.It is important for the health field to dig out and make full use of the potential knowledge from massive information for medical research.How to efficiently convert the unstructured data into structured data is an important task in data mining.This thesis aims to use text classification and event detection to complete information extraction in the field of biomedicine,which can assist medicine and realize precision medicine.The task of biomedical text classification is mainly to mine information from a coarsegrained perspective.It is one of the basic natural language processing tasks and the first step in the text mining process,which can effectively assist medical workers to obtain useful information from documents quickly.This thesis studies the classification tasks of English long text and Chinese short text respectively.Aiming at the long-distance dependence of English long texts,the HACN(hierarchical attention-based capsule network)model is proposed.The capsule network can obtain the local features to improve the precision,and the hierarchical attention mechanism can get the global features to improve the recall.We combine them to improve the overall performance of the text classification system.The model has been experimentally verified on the three corpora of the Bio Creative task,and the results have all been greatly improved.Aiming at the problem of less information in Chinese short texts,this thesis uses ensemble learning for classification.Firstly,the semantic enhance model is obtained by fine-tuning the BERT with the target field data,which is fused with the deep learning model to integrate the classifier.The classifier achieved the highest F1 on the third task corpus of CHIP2019.The biomedical event detection task is mainly to mine information from a fine-grained perspective,and aims at detecting triggers in sentences and classifying them into predefined event types,which will benefit many applications,such as summarization and reading comprehension.This thesis proposes the BInd GAC(bidirectional independent GRU-AttentionCRF)model for the long tail issue,which combines BIO tags to recognize triggers.The word vectors are trained by the Bio BERT that incorporates biomedical knowledge.This method effectively mines deep text information and improves the performance of the trigger recognizer.The validity of the proposed method is verified by experiments on the MLEE corpus.
Keywords/Search Tags:Natural Language Processing, Biomedical information extraction, Text Classification, Event Detection, Trigger Recognization
PDF Full Text Request
Related items