| The hypomethylation of human cancer genome and the hypermethylation of specific tumor suppressor gene promoters are important reasons for the rapid proliferation of cancer cells.Therefore,obtaining the distribution of 5-methylcytosine(5mC) in the promoter fragment is a key step to further understand the relationship between promoter methylation and m RNA gene expression regulation.Large-scale detection of 5mC sites in DNA by wet experiments is still very time-consuming and laborious.Therefore,there is an urgent need to design a method to identify the 5mC site of the whole genome DNA promoter.This paper is mainly based on a new fusion decision model to study the prediction of promoter methylation sites.The main research contents are as follows:1)Constructed a fusion decision predictor called iPromoter-5mC.According to a database called Encyclopedia of Cancer Cell Lines(CCLE),using the promoter region information of 17,182 genes obtained from 843 cell lines generated by simplified bisulfite sequencing technology,a small cell lung cancer(SCLC) was constructed.Promoter methylation data.Since the ratio of positive samples to negative samples is as high as 1:11,which belongs to unbalanced data,we built 11 predictors to convert unbalanced data into balanced data.One hot coding(Onehot) and Deoxynucleotide property and frequency(DPF) methods are used to encode promoter samples.The predictor uses a deep neural network(DNN) to identify the methylation modification sites in the promoter.The fusion decision method is used to fuse the prediction results of 11 predictors.The average AUC of this predictor on the independent test data set is 0.957,indicating that the predictor of the promoter 5mC methylation site is reliable.In order to facilitate the use of i Promoter-5mc predictor by biologists and gene pharmacologists,we have designed an online free prediction website:http://www.jci-bioinfo.cn/i Promoter-5mC.Researchers do not need to understand the complicated Arithmetic formulas and programming procedures,only need to submit the sequence through the website,you can get the desired results,providing a simple and effective method for users to study the promoter 5mC modification site.The source code of the designed method can also be obtained from https://github.com/zlwuxi/i Promoter-5mC for related academic research.2)Constructed a promoter recognition predictor based on a convolutional neural network.If it can first predict whether the input sequence is a promoter sequence,this can effectively improve the prediction of promoter methylation sites.Based on the promoter region information obtained by the former,the data of promoter and non-promoter were constructed.Comparing the gradient boosting iterative decision tree(GBDT) model,deep neural network,XGBoost and convolutional neural network modeling methods,constructing a predictive model for identifying promoters and non-promoters,and testing the model on an independent test set,The average AUC is 0.9166,which indicates that the predictor for promoter recognition is reliable. |