| With the development of the times,cancer has become a major threat to human life.In 2016 alone,8.9 million people died of cancer.Drugs for cancer are obviously inadequate and most cancer treatments develop resistance,which makes drug development extremely important.However,traditional drug development takes a long time,huge investment,and high risks.Drug repositioning as an alternative method for new drug development can effectively reduce development risks,shorten drug development time,and save research and development costs.At present,transcriptome-based drug relocation methods only consider genes that are significantly differently expressed,and ignore disease-causing genes.Therefore,this paper proposes a drug prediction method based on differentially expressed genes and pathogenic genes.In the method of drug prediction based on differentially expressed genes and pathogenic genes,the importance of pathogenic genes to disease is fully considered.The method first sorts genes with significant differential expression and pathogenic genes in ascending order according to the difference value,with the top down-regulated genes and the bottom upregulated genes.Then the down-regulated pathogenic genes are added to the top,and the upregulated disease-causing genes are added to the bottom,constituting a query gene set for the disease.Then based on the Kolmogorov-Smirnov statistical method,the association score between the drug and the disease was calculated and the accuracy of the prediction was calculated using the comparative toxicity database.Finally,by comparing with the connection map method without pathogenic genes,the method proposed in this paper is more accurate.In addition,literature validation and KEGG pathway validation of the top 30 drugs for breast cancer prediction were performed.Based on the analysis of the above results,it is hypothesized that highly differentially expressed genes may be noise,which has a negative effect on the prediction result,that is,reduces the accuracy of the prediction.Therefore,a screening algorithm is proposed to screen the differentially expressed genes of the disease to obtain the optimal differential gene set.Firstly,the breast cancer data were used to perform experiments to obtain the optimal differential gene set,and then the differential expression threshold corresponding to the breast cancer optimal differential gene set was used to perform experiments on other sixteen diseases.As a result,ten diseases were found to have noise.Therefore,using these ten sets of disease data,a model for predicting the threshold of disease gene collection was established,and the model was verified through six types of cancer data.It was found that the accuracy of the predicted top ten drugs was improved by at least 10%,and the accuracy of the disease sarcoma was improved by 30%.In addition,in order to verify the applicability of the model to non-cancer data,two non-cancer data were used for verification.It was found that the accuracy of the model for non-cancer data was also improved by 10%.In this paper,we find that the accuracy of the results can be improved by considering the pathogenic genes.In addition,methods based on differentially expressed genes and pathogenic genes are simple,fast,and accurate for disease with pathogenic genes.For disease without pathogenic genes,the constructed model is used to predict the threshold of the gene set,and the optimal differential gene set of the disease can also be used to obtain highly accurate prediction results. |