Font Size: a A A

Identification And Clustering Of CLAVATA3/ESR-related(CLE)Genes In Plants

Posted on:2019-10-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:1360330572482943Subject:Botany
Abstract/Summary:PDF Full Text Request
The vascular system,also known as the vascular tissue system,includes all the vascular tissues in the plant,which play a vital role in material transportation,mechanical support and signal transmission during plant growth and development.The CLE(CLAVATA3/Embryo Surrounding Region-related)family and their receptors and related pathways play an important role in the regulation of vascular system development,such as inhibition of the proliferation and division of stem cells in shoot apical meristem(SAM)and root apical meristem(RAM),and maintains the shoot apical meristem and root apical meristem.Different CLE families are expressed in different parts in plants,and the expression sites of different CLE genes are tissue-specific.Therefore,the identification and classification of CLE genes plays a crucial role in the study of plant growth and development.The CLE family belong to small screted protein.Most CLE family genes are between 50-200 aa in length,with an N-terminal signal peptide,a variable domain,and a C-terminal conserved motif(CLE-motif).In the process to find homologous sequences,since the N-terminal signal peptide and the variable region is easy to cause interference,C-terminal conserved motif is too short and CLE family is different in the different species,CLE genes is hard to predict when applying existing sequence prediction method.This research mainly includes the following three aspects.Firstly,we developed a new method for predicting CLE family genes.The method first calculated the weights of each sites in the CLE family 12 aa motif by the 12 aa motif of the reported CLE genes.Then,constructed a scoring matrix specifically for the CLE genes based on the weights of each site of the CLE family 12 aa motif.Afterthat we calculated the scores of all sequences in the transcriptome.Then use C4.5,artificial neural network(ANN)and support vector machine(SVM)to construct classifiers which integrated the characteristics of the reported CLE family genes including sequence scores,position of CLE motifs,signal peptide scores of CLE genes and length of CLE genes.Applying these classifiers,a total of 2156 CLE family candidates in 69 species were predicted were obtained.Among them,625 CLE family candidates were never reported as small secreted proteins.Secondly,members of the CLE gene family are clustered and grouped according to the sequences of their conserved CLE motifs.Because of the diversity of functions and expression sites of the CLE gene family,after predicting high-quality CLE genes,CLE family genes need to be clustered according to the similarity between their sequences,and grouped based on their clustering result.We predicted differences and associations between their corresponding functions through sequences of CLE family gene conserved motifs.However,the classification system of CLE family genes is confusing;so far there is no recognized classification system.In response to this problem,the Euclidean distance between the previously identified 2156 CLE family candidate genes were calculated based on their CLE conserved motifs.Then,clustering analysis based on the CLE conserved motif in the model plant Arabidopsis thaliana.We grouped 2156 CLE family candidate genes into a total of 6 large groups and 12 subgroups.The biological indicators such as score of each motif of CLE candidates,length of each CLE candidates,position of each motif of CLE candidates,signal peptide score of each CLE candidates and length and sequence of tail amino acid of each CLE candidates were statistically analyzed,and a complete set of CLE family gene classification system was created.Moreover,the CLE family genes are species-specific and have different sequence bias in monocots,dicotyledons,and other species,respectively.In addition,according to the evolutionary relationship between species,the origin of the CLE gene family can be explored,which plays an important role in exploring the evolutionary pattern of the CLE gene family.Thirdly,the CLE family candidate genes of Arabidopsis thaliana,rice and Populus trichocarpa were selected,and their expression patterns were analyzed by the retrieval of their expression data.Among the above three species,CLE family candidate genes with tissue-specific expression patterns were selected.For example,Arabidopsis thaliana AtCLE2 and AtCLE6 were specifically expressed in root,and AtCLE46 was specifically expressed in xylem.The rice TDIF-like gene LOC_Os02g56490 is in specifically expressed shoot apical meristem of plants.In Populus trichocarpa,TDIF-like genes such as Potri.002G241300,Potri.012G019400,Potri.001G049700 and Potri.011G102400 are specifically expressed in the phloem region of vascular tissues;CLE9/10 homologous genes such as Potri.014G156600,Potri.008G115600,Potri.010G130400 and Potri.009G068800 are specifically expressed in the xylem region of vascular tissues.Fourthly,explore the relationship between CLE family genes and other types of secreted small proteins.By analyzing the predicted results of the CLE gene family,we found that the CLE gene family is also inextricably linked to other types of secreted small proteins in plants.These different types of secreted small proteins may have similar expression patterns and regulatory mechanisms to the CLE family of genes,and may even have a common origin.The CLE family genes predicted by this study are the most comprehensive CLE family gene database in terms of quantity and quality up to now,which provides a theoretical basis for the subsequent structure and function analysis of CLE family genes.The CLE family classification system constructed in this study can explore the origin of the CLE gene family based on the evolutionary relationship between species,and plays an important role in exploring the evolutionary model of the CLE gene family.In this study,we analyzed the expression patterns of CLE family candidate genes in Arabidopsis thaliana,rice and Populus trichocarpa,and found out the CLE family genes specifically expressed in plant vascular system,which provides a theoretical basis in exploring the relationship between CLE family gene origin and vascular origin.Moreover,this study has important guiding significance for the prediction of other types of secreted small proteins,as well as the subsequent studies of structure and function,expression patterns and regulatory mechanisms.
Keywords/Search Tags:CLE gene family, small secreted protein, clustering and classification analysis, machine learning
PDF Full Text Request
Related items