Font Size: a A A

Computational Biology Research On Rice Salt Tolerance Mechanism

Posted on:2017-03-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:J X WangFull Text:PDF
GTID:1223330482994778Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Salinity is one of the most common abiotic stresses in agriculture production, and it causes enormous food reduction all over the world. It is vital to make these crops acquiring salt tolerance properties to deal with salinity threats nowadays by molecular breeding. Our nation could also benefit from the stable high status of crop production. As we all know, salt tolerance of rice(Oryza Sativa) is a complex trait controlled by various genes. Several wild species of rice are identified as salinity tolerance under salt stresses. The mechanism of rice salt tolerance, currently with limited understanding, is of great interest to molecular breeding in improving grain yield.Several researchers applied microarray and sequencing technology on the research of gene expression differences from several salinity tolerance species, and they discover and validate a serial of genes related with salt tolerance. However, the major computational problem on gene expression profile is feature selection from small samples and high dimension data(small n, big p), i.e. how to select the most trait relevant gene and gene list from few samples and tens thousands of genes. In the mean time, how to evaluate and validate the relationship between gene lists and relevant trait quantitatively is another important challenge.In this research, we constructed a mechanism network on rice tolerance using a serial of new computational biology methods, and we shed some light on the salinity tolerance of rice in systems biology view. We used dataset GSE14403 which contains most number of salinity tolerance rice samples in Gene Expression Ominibus(GEO). According to this microarray data, we developed a novel improved volcano plot method which based on machine learning to deal with gene expression data, and select rice salinity related genes. This method based on Bootstrap SVM-RFE, and used new measurement to improve classical volcano plot method, so the new method could be much more reasonable in reaching statistics and computational measurements. Contrary to benchmark datasets in known ground truth, it is very difficult to evaluate the results of feature selection on rice species. Quantitative Trait Loucs(QTL) is a classical genotype phenotype relationship searching methods based on genetics, the resources of QTL contain abundant information related with traits. We developed Microarray-QTL test base on hypergeometric distribution to test the relationship between candidate gene sets and relevant traits. In the criteria of Microarray-QTL test, the improved volcano plot method could get much better results in salinity tolerance than classical fold-change, classical t-test, and classical volcano plot methods, it could be a method with much robust attribute in this dataset.In order to validate the performance of obtained gene list, we used Gene Ontology enrichment analysis using agriculture specific Agri GO and Microarray-QTL test to validate the results. We mapped the selected genes to proteins, retrived protein-protein interaction data from database DIPOS, a rice salinity mechanism network was contructed. Using clustering analysis on the whole network’s topology, we finally got 17 modules related with salinity tolerance. From the annotation of Gene Ontology enrichment analysis, these modules include: phosphorylation activity, ubiquity activity, and several proteinase activities such as peroxidase, aspartic proteinase, glucosyltransferase, and flavonol synthase. All of these discovered modules are related to the salt tolerance mechanism of signal transduction, ion pump, abscisic acid mediation, reactive oxygen species scavenging and ion sequestration. We also used regulation motif analysis on upstream sequences and co-expression analysis on gene expression data to validate the co-regulation relationship within inner genes in these modules. We used topology information of continuous distribution between these module nodes and information flow analysis to obtain several crucial function genes in these strongly or weakly connected modules. We not only used pathway mapping technology, but also predicted the three dimensional structures of some crucial proteins related to the salt tolerance QTL on MUFOLD protein structure prediction platform, these predicted structure domains are benefit in understanding the roles of these proteins in the network. Especially on the largest module got from the mechanism network, we confirmed its relationship with phosphorylation activities via Gene Ontology enrichment analysis. In the meantime, all the other systems biology methods support this relationship via different aspects and different levels, which implicitly validated the relationship between phosphorylation and rice salinity tolerance.To meets the abundance of Next Generation Sequencing data, we proposed a novel Bayesian partition algorithm to solve high-order epistasis existed in resequencing data via exploring high-order interactions. The proposed algorithm could explore phenotype related locus in different dimensions efficiently, and this tool could provide population genetics related information on the biologcial machenism related with complex phenotypes in sequence level.Our computational study starts from the transcriptome of gene expression data, and provides an entire comprehensive computational biology pipeline on general traits of general crops in the genome, transcriptome and proteome level. On small samples and big dimension data, we proposed a novel improved volcano plot feature selection algorithm which based on bootstrap SVM-RFE to deal with gene expression data. This work also proposed Microarray-QTL test importing QTL information in feature selection algorithm evaluation. Additionaly, a Bayesian Partition method was proposed to explore further studies on resequencing data. The pipeline was firstly implemented on the research of rice salinity tolerance and this work shed some new light on the mechanism of salt tolerance by constructing the mechanism network. This work provides a systems biology view in studying plant traits in general.
Keywords/Search Tags:Feature selection, machine learning, computational biology, systems biology, mechanism network, salt tolerance
PDF Full Text Request
Related items