Font Size: a A A

Spectrum-like Graphical Representation Of Biological Sequence And Its Applications

Posted on:2015-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:J N HanFull Text:PDF
GTID:2250330428464960Subject:Basic mathematics
Abstract/Summary:PDF Full Text Request
With the emergence of genomics and proteomics, graphical representations ofbio-sequences are expanded into bioinformatics, and they grew from qualitative andpictorial representations to quantitative and numerical characterizations. In this thesis,we propose visual spectrum diagrams of DNA and protein sequences, as well as themathematical description model. Furthermore, we make a discussion in the similarityanalysis which is based on the corresponding phylogenetic tree, and on its application.(1) Based on several important properties, there exist a total of six differentpossibilities when the nucleotides are divided into three categories. Furthermore, aspectrum-like graphical representation of DNA sequence is given, which reflects theinformation of nucleotides structure and function. Then the amplitude frequencies ofspectrum-like graphic are calculated as the descriptors. The mathematical model isgiven for calculating the theoretical value of the descriptors which are used for DNAsequence similarity comparison. Similarity analysis results of the all coding sequencesof beta globin gene of11species and24coronavirus genome show the validity of thismethod.(2) In this paper, we propose a spectrum-like graphical representation of proteinsaccording to the hydrophilicity-hydrophobicity classification of the amino acid. Thefrequencies of amplitudes are extracted as descriptors which are applied tomitochondrial proteome sequences of20species. The differences of sequences areanalyzed and a phylogenetic tree is built, which are consistent with the knownevolutionary information. This method is effective for particularly long proteomesequences, and greatly reduces the complexity. Furthermore, amplitude frequencies χ2values of13proteins in9species are obtained. The correlation matrix based on the χ2values is applied to analyze similarity.(3) According to spectrum-like graphical representation of proteins, we obtainthe feature vector which is combined with components of the20kinds of amino acidsin the sequence as a new feature vector. Thus each protein sequence can be represented by a feature vector. The three apoptosis protein sequence datasets arepredicted by using the SVM, and the prediction results denote that the proposedalgorithm is effective.In this paper, a novel method of measurement of biological sequence informationis proposed according to spectrum-like representation. The model involves severalimportant physical and chemical properties of nucleotides and amino acids. Themethod balances information extraction between local and holistic features ofbiological sequence. The method is applied not only to the similarity analysis, alsoused in protein subcellular localization prediction.
Keywords/Search Tags:Genome, Spectrum-like graphical representation, Similarities analysis, Phylogenetic tree, Protein subcellular localization
PDF Full Text Request
Related items