Font Size: a A A

Study On Integration Of Gene-disease Association Data In Reactome

Posted on:2022-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:C DengFull Text:PDF
GTID:2480306575463074Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
With the rapid increase in the number of documents in the field of biomedicine,how to extract knowledge quickly and accurately from them has become more and more important.The mining of knowledge through biomedical text mining technology has become an important direction,and the extraction of entities such as gene and disease and their relationships is particularly basic and important.The proposal and application of the natural language processing model represented by the BERT model has greatly improved the performance of extraction work and laid a foundation for more and better identification of the entity relationship of gene and disease.On the other hand,many biomedical researches,as well as bioinformatics services or databases that serve these researches,have an urgent demand for gene-disease association data.At present,the research on named entity recognition and relationship extraction in the biomedical field mostly performs parameter tuning and network structure optimization on models at the theoretical level,and reuses limited public corpora for benchmarking,which implements the task of biomedical relationship extraction.There is less research,and the same is true for the extraction of relationships between gene and disease.Developed and maintained by the Molecular Systems Group of EMBL-EBI,the global service-oriented Reactome pathway database urgently needs gene-disease association data in order to provide users with richer analysis results and potential data evidence.This paper studies the extraction of gene and disease entities and relationships based on Bio BERT,and develops a gene-disease association analysis subsystem for the Reactome pathway database.details as follows:1.Establishment and evaluation of the extraction process of gene and disease entities and relationships.Firstly,text data from 92,597 literature abstracts were constructed;secondly,a relationship extraction process based on the Bio BERT model was constructed,and the acquired data was used to extract gene and disease entities and relationships,and a total of 2,501 gene-disease association pairs were extracted.It can provide research ideas for the use of machine learning methods for gene-disease association extraction.2.Reactome gene-disease association data subsystem.First,by processing the original data from Dis Ge NET,the gene-disease associations is obtained after conversion and stored in the database.Secondly,the Reactome gene disease related data display subsystem was designed and constructed to display the processed gene-disease associations.At the same time,the existing Reactome pathway analysis tools were used to analyze the related gene data of specific diseases.Then,based on the processed data,a service that meets the PSICQUIC specifications is built to provide the community with gene-disease association data.Finally,integrate the PSICQUIC data source into the existing Reactome analysis tools to achieve the purpose of extending gene-disease association data to molecular results.The display page has been online for more than six months.During this period,more than 300 independent IP users have visited the service,indicating that the subsystem can indeed bring research convenience to researchers.
Keywords/Search Tags:gene-disease association, named entity recognition, relation extraction, pathway database
PDF Full Text Request
Related items