Font Size: a A A

Research On The Relationship Between Disease And Gene Based On Information Extraction

Posted on:2021-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:S JinFull Text:PDF
GTID:2404330623977507Subject:Medical informatics
Abstract/Summary:PDF Full Text Request
Purposes:From the perspective of medical informatics,the aim is to find the relationship between disease and gene from biomedical literature data.In order to find the relationship between disease and gene from the literature,the model of disease-gene association was established based on the literature data.The model defines the intensity and depth of association,finds the association between disease and gene from literature,and realizes knowledge discovery based on literature data.By using the data of Pub Med about diabetic complications,we found the relationship between diabetic complications and genes,and demonstrated the scientificity of the model,the rationality and feasibility in the field of knowledge discovery.Through text mining of literature summary data,we can find disease related genes and their patterns and realize knowledge discovery.Then provide basis for disease prevention and treatment.Methods:Through the literature research method,this paper systematically and detailedly summarizes the research status of medical text mining,information extraction and other fields,and reviews the research related theories and methods.Based on the current research status and theoretical methods,this paper proposes a model of disease gene association discovery based on information extraction,which can discover the relationship between disease and gene,and analyze relationship between the intensity and depth of the relationship.Model integrate ontology,CO word analysis,named entity recognition,relationship extraction and other technologies,and identify and extract the relationship between disease and gene from literature abstracts.Applying the theory of knowledge discovery and bibliometrics,this paper defines the intensity and depth of association between disease and gene: the intensity of association is driven by data association,the intensity of association between disease and gene is analyzed by cluster analysis;the depth of association is driven by relationship connotation,and the model is analyzed from two aspects of semantic relationship and bioinformatics to reveal the relationship between disease and gene.The literature abstracts of diabetes complications were used to demonstrate the model.Use dictionary based entity recognition technology,construct entity extraction rules,find disease entity and gene entity and their relationship from literature data.Usie clustering analysis to find the intensity of disease and gene relationship and semantic relationship analysis and bioinformatics analysis to reveal the depth of disease and gene relationship,in which semantic relationship reveals the occurrence of gene in disease and bioinformatics analysis reveals the biological process of related genes.The empirical results of the model use the method of retrospective analysis to review and discuss the original abstract,find the relationship between disease and gene,and realize the knowledge discovery in the biomedical field based on the literature framework.Results:(1)The model can find the relationship between disease and gene from the literature data,define and analyze the intensity and depth of the relationship.(2)In the empirical part of association intensity,656 related genes of diabetic nephropathy were obtained from the data of literature summary.Based on the co-occurrence intensity index of related genes,cluster analysis was conducted to obtain three types of related genes.Among them,high correlation gene may be the theoretical basis of current research,moderate correlation gene is the hot spot of current research,low correlation gene is a possible knowledge discovery,and it may further develop into a research hotspot in the future.(3)In the empirical part of correlation depth,the gene enrichment analysis of bioinformatics found that the protein action pathway of the related genes of diabetic complications was mainly the inflammatory response pathway and the cancer pathway;the extraction of semantic relationship could find 315 kinds of semantic relationships,among which 218 were the result of action relationship,64 were the result of angular color relationship and 33 were the result of regulation relationship.Conclusion:(1)Based on the theories and methods of text mining and knowledge discovery,this paper proposes a model of disease gene association discovery,and finds and defines the relationship between disease and gene.From the perspective of data association,the model defines the association strength of association relationship,and proposes the association strength index quantitatively.From the perspective of biological relevance and semantic relevance,this paper defines the relevance depth of the relevance relationship and classifies the semantic relationship qualitatively.The model reveals the entity relationship in the biomedical field from multiple perspectives and levels,and promotes the theoretical development of medical informatics.(2)According to the principle of the model of disease-gene association,the empirical research finds the relationship between disease and gene from the literature summary of the topic of diabetic complications,and reveals the intensity and depth of the relationship.Empirical research has proved the scientific,reasonable and effective relevance discovery model,which can be used for knowledge discovery from scientific and Technological Literature in biomedical field.
Keywords/Search Tags:Entity Recognition, Information Extraction, Text Mining, Association Relationship, CO Word Analysis
PDF Full Text Request
Related items