Font Size: a A A

Using Machine Learning To Predict Maize Flowering Genes Based On Multi-omics Data

Posted on:2021-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:P TianFull Text:PDF
GTID:2393330611483039Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
Flowering time is the key transition stage for plants from vegetative growth phase to reproductive growth phase,which is closely related to crop yield.At present,in the study of flowering time,many genes have been identified in the model plant Arabidopsis thaliana,and multiple regulatory pathways have also been revealed.However,maize as the staple crop worldwide,the study of flowering time is relatively lagging,a few genes have been identified,which has significantly affected the development of its yield potential in maize.Due to the labor-consuming and timeconsuming methods of classical cloning strategies,this study tried to use state-of-theart technology machine learning based on multi-omics data to identify genes of flowering time in batch and verified them with EMS mutants in maize.The results are as follows:1.The training and test data sets used for machine learning training were collected.The former contains 39 genes related to flowering and 39 non-flowering related genes,and the latter contains 16,564 gene samples with unknown labels.The attributes data of both two sets are multiomics data including transcriptome,translatome and protein-to-protein interactome.2.Based on the training data set,6 algorithm models were trained and evaluated,and three algorithm models,Adaboost,Logistic Regression and SVM,were selected to predict the genes at flowering time.The AUC scores of the three algorithms on the integrated multi-omics data set are 0.86 ± 0.10,0.90 ± 0.03,and 0.86 ± 0.09 respectively;the evaluation algorithm's performance on different data bases found that the integrated multi-omics data performed best.The data of protein interaction group performed poorly.3.From the positive samples of the training data set and the predicted genes,48 genes were randomly selected and their mutants with truncated mutations were purchased,and 17 of them were finally obtained in which phenotypic statistics and tests were performed.Five mutant materials identified showed significantly correlation with the flowering time,and it was found that the validation rate(30%) of the predicted genes was nearly the same as that(28.6%)of the training genes.4.For the predicted gene Zm00001d011748,named MADS43,contain development-related domains TF_MADSbox and TF_Kbox,and the mutant of this gene was an early termination mutation,and the mutation site was within the TF_MADSbox domain.Compared with the wild type,the mutant type of the gene advanced nearly 5 days in the tasseling period and nearly 4 days in the pollen shed period.Finally,the analysis of the interaction network of the MADS43 gene indicates that the gene may participate in a complex network related to multiple development pathways and affect the flowering time.In this study,a number of candidate genes of flowering time were predicted using machine learning prediction method based on multi-omics data,and some genes were functionally verified using EMS mutants,confirming the effectiveness of the method and promoting the study of maize flowering time,and provides a new way for maize gene mining.
Keywords/Search Tags:Zea mays L., machine learning, flowering time, multi-omics, EMS mutant, MADS43
PDF Full Text Request
Related items