Font Size: a A A

Protein Sequences Comparison And Application Based On Physicochemical Properties And Position-Feature Of Amino Acids

Posted on:2019-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:L L YuFull Text:PDF
GTID:2370330542496773Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and high-throughput biological experiment technology,bioinformatics is developing rapidly and a large number of biological molecular sequences have been obtained.How to effectively extract information from these biological molecular sequences and analyze the relationship between them is an important basis for revealing the development of life.Protein is the material basis of the life,which control and regulate the various of functions in the cell.Therefore,it is a very significant subject to study on protein sequences effectively.This paper considers the information of the physicochemical properties of amino acids and the relative position of amino acids in the sequence.Based on the theory of graph energy,a new method is proposed to convert the protein sequences into numerical vectors.Then,the paper makes similarity analysis of protein sequences and predicts the functional protein with this method by MATLAB software.The results confirm the feasibility of this model.The main contents and innovations:(1)Based on the physicochemical properties of amino acids and the relative position information of 20 amino acids in protein sequences,01 sparse matrices are constructed and numerical vectors are obtained.Firstly,we sort the order of amino acids based on the experimental data and the numerical weights of two important physicochemical properties of amino acids,then the position sparse matrix was obtained by searching for protein sequences.After that,according to the sparse matrix,the bipartite graph of the sequence is constructed.Finally,the protein sequence is converted into a numerical vector by calculating the energy of the bipartite graph.(2)The numerical vectors are converted to probability distribution vectors.We modify the relative entropy distance and define symmetric relative entropy distance.After calculating the distance between two protein sequences by symmetric relative entropy,we construct the phylogenetic tree in order to analyze the results.(3)According to the numerical description of the sequence in this paper,the similarity analysis of protein sequences is carried out.In order to verify the effectiveness and feasibility of the proposed method,it is applied to the similarity analysis of 9 NADH Dehydrogenase 5(ND5)protein sequences,transferrins(TFs)sequences,Antifreeze proteins(AFPs)and 50 ? globin protein sequences.Compared with the phylogenetic tree constructed by existing algorithms or Clustal W that is a widely used multiple-sequence alignment program,the experimental results are almost consistent and even more reasonable.(4)Based on the numerical transformation model,a new method of feature extraction is constructed that reintegrates the component momentum vector(CMV)and the weighted amino acid component characteristics.Then,the eigenvectors of samples are input into the support vector machine classifier,and the 5-fold cross validation is used to determine the parameters of the model.We evaluate the prediction performance with 4 Classic evaluation indexes.It was proved that the prediction performance of this model is very good and universal in many datasets,such as anticancer peptides,allergenic proteins,bacterial adhesin,eukaryotic cytotoxic protein and HIV protein.In conclusion,the numerical model based on the physicochemical properties of amino acids and the relative position of amino acids in the sequence is very reasonable and effective in similarity analysis of protein and prediction of functional protein.What' s more,it is helpful for the research of drug-target interaction,the development of vaccines,and the treatment of disease.
Keywords/Search Tags:similarity analysis, graph energy, feature vector, prediction of functional protein
PDF Full Text Request
Related items