Font Size: a A A

The Research On Information Analysis And Applied Algorithms Of Gene Senquence And Structure

Posted on:2011-12-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y XiangFull Text:PDF
GTID:1220330395985628Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With as the focus of human gene transformed to functional genomics, The accumulation of biology sequence data has offered a bright future to life sciences research, but also a severe challenge to the capacity of contemporary biological data processing. It is the main goal of bioinformatics how to mine valuable biology information from the vast biology sequence data, to understanding the structure, function and evolution of genes and protein. It is very important to research on information analysis of gene sequence and structure.The information of gene sequence and structure can be obtained by contrast of them which bases on the alignment of them. It serves for the achievement of gene group structure and Phylogenesis. Protein is the product of gene expression and the undertaker of physical activities. Protein subcellular location has close relation with the function of protein that the information of the former can provide valuable clues for the research of the later. In the protein subcellular localization prediction, how to obtain more complete information of sequence features is key. This essay will focus on the gene sequence or structure information of the subject, from the depth of the following three aspects.1) a new way of contrast of sequence and structure to improve the veracity of the multiple sequence alignment with great diversity.2) Phylogenetic analysis based on the illustrated whole gene group.3) protein subcellular location predication based on the comb charactersThe main work is summarized as follows:(1) For not all processes are need in dynamic planning process, this essay put forward a more effective non-dynamic programming algorithms-the minimum edit distance based on sequence alignment algorithm, its execution time complexity is O(nL), space complexity is O(n), Other fastest algorithm is proposed by the Pevzner and Waterman, and its complexity are O (l+Ln) time and O (l+Ln) space(2)For multiple sequence alignment (MSA) calculations of high complexity, this paper introdices a ichnography to describe the MSA progress, by which that can take into account of every possible alignment, defines the space insert, iterative information value and scoring rules of each optional path, induct ant colony genetic algorithm to explore the solution space to solve the MSA problem. This method of representation integrates the advantages of both genetic algorithms, improves the ability to find feasible solutions and avoid premature convergence.(3) the presence representation of the RNA secondary structure has high complexity, degradation, and different structures which may correspond to the same problem that was proposed RNA secondary structure.by the three or four encoding methods, using the binary OR operation of the RNA to analysis the RNA secondary structure. Structure encoding can display simple and direct structural information to help better realize the visualization of mutation analysis to infer the mechanism of disease. Structure encoding for structural comparison provides a good model, it is easy to find similarities between the structure and differences, to facilitate detection of genes and gene function prediction area.The method can not only well distinguish freebase and base pair on their location but also distinguish different sub-structures objects including Pseudoknot.(4) For a phylogenetic analysis needs the guidance of the tree, and the guide tree-level exists the problem of poor similarity, this essay puts forward a new method of analysis genome evolutionary relationship which is represented by two-dimensional graph based on complete genome sequences of a new two-dimensional graphical with the thought of graphical representation of biological sequences and proposed a two-dimensional graphical representation of DNA sequence. The new method gets the evolutionary distance by measuring the difference between two-dimensional curves. The result is consistent with the actual evolutionary tree when experimentally compare the similarity/dissimilarity of Coronavirus DNA sequences and use PHILIP package phylogenetic tree. The method uses the similar matrix of the whole genome instead of the evolutionary distance matrix and does not need multiple sequence alignment. It not only well embodies the relationship between species, but also greatly reduces the complexity in time and in space.(5) This essay introduces a protein sequence coding method based on distance frequency, which defines an original sequence as the220-dimensional feature vector to represent a complex protein that contains20amino acids and distance frequency of200same amino acids. Then, we use support vector machine for protein subcellular localization prediction. The experimental results show the effectiveness of the method...
Keywords/Search Tags:Bioinformatics, Pairwise alignment, multiple sequence alignment, Antcolony algorithm, Phylogenetic analysis
PDF Full Text Request
Related items