| With the development of high-throughput sequencing technology,a large amount of omics data has been accumulated.How to mine valuable information from bio-omics big data and apply it to the fields of medicine and pharmacy is a major challenge for omics data analysis.Faced with highly complex biological data with amazing dimensions,more and more artificial intelligence methods are applied to omics data analysis.The research work of this paper mainly focuses on the application of AI-based multi-omics data analysis methods in drug design,including building a deep learning-based miRNA targeting gene prediction model(Chapter 2)and immune checkpoint inhibitor drugs Response prediction and discovery of corresponding biomarkers(Chapter 3).miRNAs play an important role in the human body.It is an important biomolecule that differentiates humans from other organisms at the genomic level.At present,many miRNA molecules are used as drug targets,molecular therapy programs,and biological markers of diseases.An important prerequisite for miRNA to play a biological role is the combination with its target gene,so the accurate prediction of miRNA target gene has a huge impact on the fields of miRNA-related drug treatment,disease prevention,and medical transformation.Current methods for predicting miRNA-targeted genes are either limited by the characteristics of manual screening and cannot contain potential binding sites outside the rules,or the models are prone to overfitting due to class imbalance in the database.Therefore,how to build a method that can automatically learn canonical and non-canonical miRNA-m RNA binding patterns and solve the problem of model overfitting in the problem of predicting miRNA target genes is a difficult task.In Chapter 2,we build a hybrid model based on CNN and Transformer and apply it to the prediction of miRNA-targeted genes.The new model is able to predict which m RNAs are targeted by miRNAs,as well as the specific sites on the targeted m RNAs.Compared with the existing two types of models,it is found that the new model has the best prediction effect(Accuracy = 0.823,Specificity = 0.810),far exceeding the baseline model(miRAW;Accuracy = 0.688,Specificity = 0.471).At the same time,the new model alleviates the problem of overfitting during deep model training.Subsequently,we analyzed the reasons for the validity of the model through ablation experiments,and found that the mixed model structure predicted better than the single model structure,and the model only used the sequence information of miRNAs when predicting most of the samples.In addition,through the case analysis of hsa-miR-552 targeting FXR2,we believe that the real binding sites cannot be screened in the sequence of unknown m RNAs based on artificially extracted features.Through decoy experiments,it was found that the new model could identify the true binding sites contained in a set of sequences.In conclusion,this study provides help and reference for deep learning models to process biological sequence information,and solves two important problems in the task of miRNA target gene prediction through a new model architecture.Immune checkpoint therapy is currently the most commonly used immunotherapy regimen and is widely used in the treatment of various types of tumors.It can synergize with physiotherapy,chemotherapy,and targeted therapy for the treatment of cancer,so that the survival period of cancer patients has been significantly improved.In addition,immunotherapy has the advantages of long duration of action and low adverse effects/toxicity.Although it has achieved great clinical success,its therapeutic effect has a major drawback,namely the individual difference term.Studies have reported that only about 20% of cancer patients can benefit from immune checkpoint inhibitor therapy.How to find and expand the beneficiary population is a major problem facing immune checkpoint therapy at this stage.Existing studies on immune checkpoint inhibitor response prediction often draw conclusions from statistical analysis in small samples and use individual biomarkers to make judgments,ignoring the internal correlations between potential markers.By integrating multi-omics data,we systematically build machine learning models to predict drug response and discover relevant biomarkers in our study in Chapter 3.We collated data from the i Atlas database and related records in various clinical articles to collect a sample of 281 patients and their demographic(sex and age),transcriptomic,immunomic,and clinical follow-up medical record data.In the data processing stage,we filtered 18,417 gene signatures through the records in the GSEA database and published related literature,and finally obtained 1,419 gene signatures that were highly correlated with immune checkpoint therapy.We then modeled the data using a random forest model and predicted long-term clinical benefit for patients.In the external validation set,the model performance(AUCROC = 0.850)is higher than the baseline method(TIDE,AUCROC = 0.763).After the model was established,the contribution of the input features was ranked by the unique structure of the random forest model,and the key gene features were found: CLEC4 A,P2RY6,FBXL5,PCMTD1,IFITM10.Finally,through gene enrichment analysis,it was found that key gene features were mainly enriched in the pathways of tuberculosis immune response,iron binding,and endogenous cell membrane composition.In this study,we found that the use of multi-omics data can improve the interpretability of drug response prediction models and identify key biomarkers.In addition,we also explore methods for processing complex high-dimensional omics data.To sum up,the paper focuses on the methods and applications of using AI technology to help multi-omics data analysis methods in drug design,including two scientific research projects: miRNA-targeted gene prediction and drug response prediction of immune checkpoint inhibitors.For the first project,we established the deep learning model with the highest accuracy,and solved the problem of excessive false positives caused by class imbalance samples in other models.In the second project,we collected a sample of 281 patients receiving immunotherapy from literature and databases,and then built a random forest model that was able to outperform the baseline model based on demographic,omics,and clinical data characteristics,and found that Potential biomarkers for predicting immune checkpoint inhibitor drug response. |