Font Size: a A A

A Genetic Risk Information Extraction Method Based On Medical Literature Mining

Posted on:2022-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:C H LvFull Text:PDF
GTID:2494306740483174Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of biomedical technology,researchers need to spend a lot of time to select genetic risk related literature and extract genetic risk information.Therefore,automatic classification of genetic risk literature and extraction of genetic risk relationships is an important topic in the field of biomedical science.Genetic risk research literature is scattered in a large number of biomedical literature,and medical literature is highly professional,requiring professional researchers to distinguish;At the same time,the problems of dispersal and multi-locus coexistence exist in the genetic risk relationship in medical texts,which brings great challenges to the existing genetic risk information extraction.Based on the above background,this paper studies the extraction method of genetic risk information based on medical literature mining.The main research contents are as follows:(1)A knowledge-enhanced multi-channel CNN(KMCNN)method is proposed for genetic risk text classification.KMCNN method maps biomedical entities in genetic risk papers to a Medical vector through UMLS(Unified Medical Language System),and generates multiple input text channels for the same text according to different pre-trained word vectors.LSTM(Long Short-Term Memory)is used to capture the spelling,prefix and suffix information of words,and finally a CNN model is used to classify the text.Through ablation experiment,parameter sensitivity experiment and comparative experiment,this paper verified the effectiveness of KMCNN method for genetic risk literature classification.(2)For different types of entities in genetic risk literature,a self-attention based Named Entity Recognition(S-NER)scheme was proposed.The rule-based approach proposes a variety of different rule matching schemes for point entities and p-valued entities.For the recognition of disease entities,this paper proposes a BI-LSTM-CRF model based on self-attention mechanism for recognition.In the model,CNN is used to extract character level word features,the position information of words is used to strengthen labeling constraints,and richer semantic information is captured through self-attention layer.Finally,the final labeling results were obtained through the classic BI-LSTM-CRF model.Through ablation experiments and comparative experiments,this paper verifies that the proposed scheme can achieve good performance on open data sets.(3)Self-training Semi-supervised Relation Extraction(ST-SRE)was proposed to extract genetic risk Relation.Aiming at the identified genetic risk entity pairs,it is necessary to judge the relationship between them,so the relationship extraction task is transformed into the relationship classification task.With a small number of data samples tagging and a self-training model,we can generate high quality tagging data from remote monitoring data,and finally train a relational classification model.The effectiveness of the proposed scheme was verified by ablation experiment and comparison experiment.(4)The genetic risk information extraction tool is designed and implemented.The system integrates the genetic risk text classification method,the genetic risk named entity identification scheme and the genetic risk relationship extraction method.Users can upload the literatures to be extracted from the web page.The system first determines whether the literatures are related to genetic risk,then extracts the genetic risk information from the papers related to genetic risk and displays them on the web page.Finally,users can download the genetic risk information in JSON format.
Keywords/Search Tags:Genetic risk, Text classification, Relation extraction, Named entity recognition, Attention mechanism
PDF Full Text Request
Related items