Font Size: a A A

Protein Subcellular Localization Based On Local Feature Expression And Global Statistical Dimension Reduction Algorithm

Posted on:2018-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:W J LiFull Text:PDF
GTID:2370330518458872Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As is known to all,the reason why protein subcellular localization has become a research focus in biology because the position of protein in subcellular is closely related to its function.With the discovery of massive biological data,traditional biological experiments can't meet the requirements,the high efficiency of computer can help biologists to save time and labor,which is an important tool to predict protein subcellular location.The main work of this thesis is to propose two kinds of local feature expression and use global statistical dimension reduction algorithm to analyze the influence of protein data,including the following three aspects.1.This thesis proposes a new local feature expression method PSSM-SAA,which combines the global statistical dimensionality reduction algorithm LDA to balance the extracted information.PSSM-SAA first let protein sequence divided into Unequal length subsequence by Segmentation idea in PSSM,then distribution density of amino acids in each segment was extracted,finally let each protein sequence can be expressed as a 1600-dimensional feature vector.PSSM-SAA contains the discriminatory information of amino acid distribution in the process of protein local evolution.In order to reduce data redundancy,linear discriminant analysis(LDA)was used to reduce dimension of PSSM-SAA.LDA is combined with PSSM-SAA to synthesize a prediction model of protein subcellular localization.Thus,the local information in the feature expression and the global information in the dimensionality reduction are balanced.The experimental results show that the local information of the protein sequence extracted by PSSM-SAA can be reduced dimension comprehensively and effectively,thus proves that the model of PSSM-SAA method and LDA algorithm has good performance.2.Dimension reduction algorithms are generally simple used to reduce data redundancy in protein subcellular localization,while there is relatively little research on the influence of dimensionality reduction algorithms on protein data.This thesis is directed against this issue introducing two dimension reduction algorithms-median Linear discriminant Analysis(MDA)and median-mean line based discriminant analysis(MMLDA)-which are proposed for the problems of outlier and center deviation of a certain class in data.MDA and MMLDA are compared with the classical LDA algorithm.The results of MDA and MMLDA compared with LDA showed no obvious increase,this may be due to protein data did not have the problems such as outlier and center deviation of a certain class.In this kind of data,we use LDA can achieve better results,it further proves that PSSM-SAA and LDA model is effective and feasible.3.This thesis also attempts a local feature expression method containing the secondary structure prediction information to predict protein subcellular location,the method is based on Chou-Fasman method predicting the type of secondary structure of each peptide segment of protein sequence.
Keywords/Search Tags:Protein subcellular localization, Local feature expression, Segment distribution, Global statistical dimension reduction algorithm
PDF Full Text Request
Related items