| Trees provide large amounts of biomass materials and bioenergy for people.However,the long growth cycle and relatively large size of genome makes it is difficult to conduct molecular biology experiments on these plants directly.Because of fast growing,easy propagation,relatively small genome and relatively easy to characterize transgenic research,poplar has become a good model organism of woody plants.In addition,poplar can be planted in most lands of China and is the most adaptive and the most widely used forest tree among existing plantation.It has become the main raw material of China’s wood-based panel industry and wood pulp.The understanding of the biological process of poplar will effectively promote poplar breeding and genetic improvements.Poplar(Populus trichocarpa)is the first species with whole genome sequencing in the forest,but there are still many genes in P.trichocarpa lack functional annotations.The study built the poplar gene function network and then develop poplar gene function annotation platform based on poplar gene functional annotation,which can provide important information for the functional annotation of genes in poplar.Furthermore,based on the poplar gene network,the bioinformatics analysis of wood formation and development genes was carried out by using the developed platform,which will provide a reference for understanding wood formation from the system level.The main results are as follows:1.Machine learning technology was been used to construct the poplar functional gene network.Based on the genome and other different types of data,a variety of algorithms were used to explore the functional associations of poplar genes based on a total of 23 different data sources.Based on the constructed training standard set,Bayesian framework algorithm,which is a kind of machine learning algorithms,was used to score the gene associations and eventually integrated all associations into a comprehensive and genome-wide poplar gene function network — PoplarNet.PoplarNet contains 1967631 functional gene linkages,covering 70% of the Populus trichocarpa reference genes.The precision-recall analysis showed that the overall network had higher coverage and accuracy than any functional gene network set derived from single dataset,and also than the poplar gene function network constructed using orthology mapping method based on Arabidopsis and rice functional gene networks.An important function of the gene network is used for the genes prediction for specific phenotype.The validation based on test sets found that the constructed poplar functional gene network has higher levels in both known pathways reflection and unknown genes prediction.The analyses of basic properties of network attributes indicate that PoplarNet obeys the properties of biological networks,such as non-scale and modular structure composition.The cluster analysis in the largest sub-network detected 24 core modules,which may play important roles in the life activities in poplar.2.PoplarGene,a bioinformatics platform for poplar gene function query and analysis,was constructed based on P.trichocarpa functional annotation and extensive gene functional annotation from multi-angles.The platform provides intuitive and friendly interfaces which can not only be used to retrieve gene annotation information but also provide a variety of functionalities,including neighbor-based gene priority,context-based gene priority,orthology network transferring,promoter sequence analysis and gene set enrichment analysis.Meanwhile,PoplarGene platform also integrates a variety of convenient bioinformatics tools.PoplarGene not only can annotate gene function through genes related to functional annotation of unknown genes,greatly improving the coverage of annotated genes.PoplarGene platform can be used for new candidate genes identification based on network.Furthermore,several case analyses were conducted to show the usefulless of PoplarGene: a)PoplarGene were used to retrieve the gene function information of unknown genes,including many aspects of gene annotation information;b)the neighbor-based gene priority was used to priorize the genes and predict new candidate genes related to xylem cell development while the context-based gene priority was used to priorize stress tolerance genes and resistance genes and identify the new hub genes;c)the orthology mapping function of PoplarGene was utilized to construct functional gene network of eucalyptus based on poplar functional gene network,and the comparative analysis shows that poplar is more suitable for the construction of functional gene network for other woody plants;d)PoplarGene was used to detect the cis-elements in known genes related to xylem cell development and a batch of cis-acting elements was obtained.3.Based on wood formation genes obtained from the public database,PoplarGene platform was used to carry out the systematic bioinformatics analysis on the genes associated with poplar wood formation.Based on the PoplarNet,poplar wood formation functional gene sub-network was constructed,and the cluster analysis showed that the sub network consists of 16 closely related modules.The function enrichment analysis found that these modules mainly involved in sugar binding,sequence-specific DNA binding,transcription regulation,hydrolase activities and cell wall repair etc.Adaptive evolution of wood formation genes demonstrated that most of the wood formation genes have experienced negative selection pressures,and only 178 of genes have undergone positive selection.In the wood formation functional gene sub-network,most of genes under positive selection locats on the non-hub nodes,whose connection degree distribution was significantly lower than that of random nodes in the sub-network.The negative selection of wood formation genes tend to have high degree of connection,and nine of the top 10 nodes ranked by degrees are subject to negative selection.miRNA regulation network of poplar wood formation genes were derived from PoplarGene platform,which contains 151 poplar microRNA and 142 poplar genes.The 151 microRNA derives from 19 microRNA families.The entire network consists of 31 sub-networks and the largest sub-network Subnet1 contains 42 nodes.The gene function enrichment analysis of the regulatory network showed that these targets mainly involved in hormone response and transcription factor DNA binding functions.In conclusion,the construction of poplar gene function network and the development of PoplarGene platform for functional annotation and analysis of poplar genes will provide important reference information and bioinformatics platform for the research community.Moreover,PoplarNet and PoparGene will be regulately updated with the increase of poplar molecular studies.The overview of gene function network and miRNA regulatory network can help the mining of key genes and factors related to the poplar wood formation,which will provide more information for further study on wood genetic modification.The negative selection genes,usually locating on hub positions in wood formation functional gene subnetwork,can also be considered to the candidate target genes of wood genetic improvements.The functional roles of 19 families of microRNA in wood formation can be further studied by molecular biology experiment. |