Font Size: a A A

Biological Data Information Analysis Methods And Its Application

Posted on:2015-07-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:H W ChenFull Text:PDF
GTID:1360330491952445Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,with the successful completion of human genome plan(HGP),the biological data correlated to human genetic has been greatly enriched.Consequently,the researchers are being keen on systems biology research based on large-scale data such as gene and protein.Analysis vast amounts of biological data by utilizing bioinformatics method,explore and understand the occurrence and development mechanism of complex human diseases,and realize individualized prevention and health care of major complex diseases is the starting point of the research work in our country and all over the world.Authoritative research has suggested that,to understand the formation mechanism of complex diseases systematically,it is necessary to form effective system analysis methods.In this study,we will expand from genes,proteins,these two key kinds of biological data,firstly,we mine the function of the gene by using gene expression data,explore the relationship between genes by combining pathway analysis methods.Furthermore,we launch the research of protein subcellular localization to further understand the function and structure of proteins,thus contribute to the development of new drugs of complex diseases.In this paper,the basic concepts of the related research contents and overseas and domestic research status are introduced.After that,the gene expression analysis based on the expression spectral data and protein function research methods based on the subcellular localization are detailed,and the characteristics of the existing methods are summarized.In this paper,to more complete and accurate understand the function of gene and protein;these following works are carried out.To address the lack of biological meaning,a gene expression profile bi-clustering analysis method is proposed to find the gene cluster with similar functions.The method applies a similarity score measurement according to the degree of gene expression changes under different conditions,and designs a fitness function to improve the genetic algorithms to search the optimal genetic clustering.With the information of biological pathways module,a common expression analysis method based on the biological pathways is proposed to identify the co-expressed genes.Consequently,the extraction of key genes of biological pathways helps us to identify the pathogenic genes in complex disease analysis.Combined with the complementarity of a variety of information and the advantage of integration strategy,two numerical characteristics fusion methods are designed.Considering the position and angle of protein amino acid sequences,a numerical coding method that merged the position information of amino acid residues and the composition information of pseudo amino acid is proposed to extract protein structure information,improving the accuracy of the subcellular localization.Starting from the integration of global features and local characteristics of the sequence,a numerical coding method considering the global information and local information is designed to make use of numerical description of protein sequences by using the sequence of amino acids,the composition information of compression tripeptide and local frequency domain values.The experimental results show that the complementary information extracted from different angles is useful for more fully describing the protein sequences to improve prediction performance of protein subtype.Aiming at the small sample size and the imbalance of category sample,we propose a subcellular localization method based on transfer learning.During the process of subcellular localization,an adaptive detection mechanism is applied to ensure the prediction accuracy and computation complexity.At last,the experiment results verify the applicability of the transfer learning model for imbalance data model.To address the low accuracy of protein interaction prediction,a fusion strategy combining protein sequence information and gene ontology annotations is proposed to predict protein-protein interaction.Firstly,the gene ontology data is processed by gene function annotation based on expression profiles and a protein interaction dataset is built by protein subcellular localization.Then,apply the fusion strategy and the integrated learning model to predict interaction.The experimental results show that the accuracy and generalization of protein interaction prediction are improved.
Keywords/Search Tags:Gene expression profile, Biclustering, genetic algorithm, protein subcellular localization, support vector machine, transfer learning, PPI
PDF Full Text Request
Related items