Font Size: a A A

A Research On Prostate Cancer Drug Repurposing Based On Text Mining And Multi-source Data

Posted on:2024-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y JiangFull Text:PDF
GTID:2544306911993839Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As one of the diseases with the highest fatality rate,cancer has a great impact on the development of human society.Recently,domestic and foreign cancer data analysis reports show that the development of cancer is worrying,the number of people suffering from cancer is still increasing on a global scale every year.Prostate cancer stands out among the growing numbers of new cancers.In Europe and the United States,the number of patients with prostate cancer continues to increase.Although the proportion of prostate cancer in Asia is not high,due to the large population base,the number of patients with prostate cancer is still not negligible.In the aspect of cancer treatment,surgery and radiation therapy are the mainstream,and there are not many drugs to choose from in the aspect of drug therapy.Therefore,this paper has made some research on the problem of repositioning prostate cancer drugs.The study was divided into two stages,namely text mining stage and drug screening stage.Specific research contents are as follows:1.Construct CPCa RE prostate cancer data set and build improved XLC-Casrel model based on Cas Rel model for entity relationship extraction of prostate cancer text data.In the drug relocation strategy,considering that the final recommended results include traditional Chinese medicine and Western medicine,and Chinese medical literature contains a large amount of information related to traditional Chinese medicine,Chinese medical text mining technology and multi-source data method are used to complete the drug relocation research.In the stage of text mining,this paper proposes an improved XLC-Casrel relational extraction model based on the cascade pointer annotation model,namely Casrel model.The model is divided into four modules,which are chinese-xlnet-base pre-training encoder module,header entity extraction module,conditional layer normalization processing module,and specific relation tail entity extraction module.In terms of data sets,in order to better realize the extraction of prostate texts,this paper constructs CPCa RE,a corpus of Chinese medical texts for prostate cancer,based on the structure of CMe IE,a public data set,and through the application of network capturing technology and data preprocessing technology.The model is tested on CMe IE and CPCa RE data sets,both of which achieve better performance.On the CMe IE data set,the accuracy rate of XLC-Casrel model is 65.53%,the recall rate is 63.19%,and the average score of harmonic is 64.34%,all higher than that of the basic model.On the CPCa RE data set,the performance index of XLC-Casrel model is 65.62%,respectively.The recall rate was 62.31%,and the F1 value was63.92%.2.The triplet relationship is combined with the multi-source database for the mining of the core genes of prostate cancer and the task of drug recommendation centered on the core genes.In the drug screening stage,firstly,XLC-Casrel model was used to extract entity relationship triplets from the original text to obtain effective gene sets,symptom sets and TCM sets.Based on the interaction between gene and symptom,gene and drug,protein and protein,using Gen CLi P3,Metascape and Cytoscape tools,combined with various omics data collected in STRING database,DGIdb database and Sym Map database,Through the analysis of gene ontology and function enrichment and protein-protein interaction,13 core genes that are more closely associated with prostate cancer were selected.Finally,after screening in the drug database,the recommendations of chemical drugs,traditional Chinese medicine and efficacy of traditional Chinese medicine were respectively given,and the rationality of the proposed drugs was verified through the official clinical trial record database and literature tracing.
Keywords/Search Tags:prostate cancer, Chinese medical text mining, relationship extraction, drug repurposing
PDF Full Text Request
Related items