Font Size: a A A

Using Deep Learning To Identify Gene Splicing Sites Of Crops

Posted on:2020-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:R LiFull Text:PDF
GTID:2370330575464188Subject:Agricultural informatization
Abstract/Summary:PDF Full Text Request
Accurate identification of gene splicing sites is of great significance for understanding and controlling the expression of genetic traits.Based on the genetic splicing site datasets of Arabidopsis thaliana,rice and maize,a deep learning network model DeepAS(CNN+GRU+LSTM)was designed based on convolutional neural network and recurrent neural network.According to the DeepAS model,the crop gene splicing site identification system was developed to realize the rapid and accurate identification of crop gene splicing sites,which is con venient for researchers to use and speed up the efficiency of scientific research.The main re search contents are as follows:1?The gene splicing site datasets were extracted based on the genetic data of three pri mitive crops of Arabidopsis thaliana,rice and maize,and the genetic splice site model traini ng set was made separately.2?Based on the Tensorflow+Keras deep learning framework,We proposed a crop gen e splicing site identification model,and 51 different model structures were designed to train and test the data sets of three crop gene splicing sites and their mixed data sets.Pick the net work model with the highest identification accuracy on each data set,name it DeepAS,and save its model and weight.Experiments show that the DeepAS network model has good acc uracy and generalization ability when it is used to identify crop gene splicing sites.The iden tification accuracy rate on the crop mixed gene dataset is 97.09%,the accuracy rate is 96.88%,and the recall rate is 96.92%,F1_Score is 96.90%,which is superior to the traditional m achine learning model and the deep learning model of other researchers compared with this paper.3? Further study on the characteristics of the splicing site sequence,using a special no n-splicing site data set for verification,in the case of removing the main feature of the GT-A G rule,the identification accuracy remains above 96%.This proves that the learned features are not only the GT-AG rule,but the complex and complex,on the other hand,it also proves that the DeepAS model has good stability and generalization ability.In addition,a three-cla ss model that can identify acceptor splicing sites,donor splicing sites,and non-splicing sites was designed based on the two-class model,and was tested on a gene splicing site dataset mixed with three crops.The accuracy rate is 85.91%,which fills the gap in the three-categor y identification problem.4?Based on the designed DeepAS model,the crop gene splicing site identification sys tem was developed.The system can match the corresponding model according to different data selected by the user.After inputting or uploading data and submitting the data,the identi fication result can be feedback in real time.The system URL is http:/ /www.deepbiology.cn/DeepAS/.
Keywords/Search Tags:Deep Learning, Gene Splicing Site identification, Convolutional Neural N etwork, Recurrent Neural Network
PDF Full Text Request
Related items