Font Size: a A A

Researches On Protein Sequence Similarity Based On Dynamic Time Warping

Posted on:2019-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y L J ZhangFull Text:PDF
GTID:2370330548982860Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
For the growing number of biological data,the computer has the advantages of fast calculation speed and low experimental cost which make it play an increasingly important role in the processing of biological information.How to exploit new algorithms for biological sequences from the priro knowledge and/or data has become an important research interest.This thesis focuses on the dynamic time warping algorithm and makes use of various methods of machine learning to study the similarity measure of biological sequences.Dynamic time warping algorithm(DTW)is a new distance algorithm.It develops the idea of dynamic programming to overcome the problem of different time sequence lengths.Protein sequences can be considered as various biological signal expressions with different lengths.Therefore,the DTW algorithm is introduced into our research to solve the computational errors caused by the specific properties of biological sequences.In our studies,with regard to different biological sequences,the dynamic time warping algorithm is combined with other machine learning algorithms to develop the new similarity measure of biological sequences.The main work of the thesis is as follows:A novel signal peptide feature extraction algorithm based on compressive sensing and dynamic time warping is proposed,which studies on the similarity of signal peptides with short length and large variation in amino acid sequence.Firstly,compressive sensing technology is applied to project high-dimensional sequences onto low-dimensional space to extract observation signals.The redundant information is removed while retaining the main information of the biological sequence.Then the dynamic time warping algorithm is introduced to extract the new feature vectors according to the similarity of the training samples.The features extracted by this algorithm not only embody important information such as amino acid composition,sort order,and primary structure in the sequence,but also can nonlinearly align different regions of the sequence in the time dimension.Finally,the machine learning method is used for analysis and verification.The experimental results showed that this method can obtain highly discriminative features of signal peptides and can distinguish secreted proteins from non-secreted proteins based on similarity.A subcellular location classification algorithm based on wavelet packet decomposition and dynamic time warping is proposed to compare and classify long sequences by computing similarity.Firstly,for four kinds of subcellular localizations,the protein sequence is converted into numerical signal according to the different physical and chemical properties of amino acids.Then the idea of signal time-frequency decomposition is introduce to reduce the processing error caused by quantifying biological sequences with a single physical index.Moreover,signal is decomposed by the wavelet packet decomposition algorithm to obtain signal fragments of different frequency bands.Finally,dynamic time warping is used to calculate the similarity of signal segments in different frequency bands,and then to obtain the similarity between proteins.Consequently,the classification of non-equal length protein sequences can be achieved.
Keywords/Search Tags:Dynamic time warping, Compressive sensing, Wavelet packet decomposition, Protein sequence, Similarity
PDF Full Text Request
Related items