Font Size: a A A

Numerical Feature Extraction Of Protein Sequences And Its Applications

Posted on:2018-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:T SongFull Text:PDF
GTID:2310330533463194Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Protein is the material basis of life and the protein sequences decide its structure and function.Therefore the feature extraction of protein sequence is to explore the important step which the law of life is hidden in the protein sequence.And it is based on the known organisms changing trend.This paper will deeply study the numerical characteristics of protein sequence extraction method and it will be applied to protein sequence similarity analysis and segmentation algorithm.First,three categories features of protein sequence are extracted: pseudo markov transition probability of amino acid,the content ratio of amino acid and the position ratio of amino acid.To deeply analyze the relationship of three categories features,the paper established the mathematical relationship between pseudo markov transition probability and the content ratio of amino acid.Using 440 dimensional feature vector which is integrated of three categories features to characterize the protein sequences.However,based on theory of the bigger(less)Euclidean distance is,the less(bigger)sequence similarity is.The three kinds of characteristic vectors respectively are used in four protein data sets(ND5 data set,F10 and G11 data set,Beta protein data set,lactoferrin and transferrin data set)of similarity analysis.And using the Euclidean distance similarity matrix,phylogenetic tree and similarity heat map to reflect the sequence similarity result from three angles.The experimental results show that three kinds of features can effectively reflect the composition and distribution of protein sequence and it is a powerful tool to analyze and research.Inspired by DNA segmentation algorithm,this paper will use the content ratio of amino acid to construct the segmentation algorithm of protein sequence.We found that protein segmentation algorithm and DNA segmentation algorithm have similar properties by experiments.Through the strict mathematical deduction,we prove the protein segmentation theory.This work is the direct promotion of DNA segmentation algorithm and theory.It preliminarily attempts to the content ratio feature using in protein segmentation algorithm.It provides the reference for the subsequent protein sequence segmentation algorithm and its application.
Keywords/Search Tags:Protein sequence, numerical characteristics, sequence similarity analysis, phylogenetic tree, similarity heat map, segmentation algorithm, segmentation theory
PDF Full Text Request
Related items