Font Size: a A A

Protein Sequence Pattern Discovery Algorithm

Posted on:2010-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:D Y NiuFull Text:PDF
GTID:2208360275483570Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bioinformatics, a science developed by the interaction of modern biology, mathematics, informatics, computer science, statistics, physics and chemistry, studies the collection, storage, transfer, search, analysis and translation of various biological information.Exponentially exploding bioinformation data has brought a new multidisciplinary research area--- computational biology, and subsequently new challenges come to the research community on data mining, machine learning and statistical learning. One of major research issues in computational biology is on protein structure prediction based on protein sequence. From the perspective of computer science, this is a classification prediction issue. How to build effective and efficient models for classification problems is a hot spot for researches on data mining, machine learning and statistical learning.Sequence alignment is a basic and important tool in bioinformatics. The research off a stand sensitive biology sequence alignment algorithm is a current hot topic of bioinformatics. This paper introduces a definition of sequence alignment; aswell as the research advance of alignment algorithms at present, and describes the advantage and limit of the algorithms and applicable fields. Lastly, the problems and development directions are pointed out.Concerning the problem of protein structure, often two sequences that share similar substrings have similar functional properties. Learning of the characteristics and properties of an unknown protein is much easier if its likely functional properties can be predicted by finding the substrings already known from other protein sequences. The sequence pattern search algorithm proposed in this paper searches for similar matches between a pattern and a sequence by using fuzzy logic and calculates the degree of similarity from a sequence inference step. The result shows that the proposed algorithm is capable of identifying sequences that have a similar pattern compared to their family protein motifs.The main works are as follows:In this papper, we describe exisiting aligment algorithms, such as Smith-Waterman, BLAST, FASTA, and expose potential strengths and weaknesses of the most widely used alignment packages.Make a detailed analysis of the Pratt algorithms,and make test by using three different protein database data from PROSITE protein database.Detailed introduce the sequence pattern search algorithm proposed in this paper searches for similar matches between a pattern and a sequence by using fuzzy logic and calculates the degree of similarity from a sequence inference step. And make same test as Pratt algrithms.Lastly, we compare and analysis the test date, expose potential strengths and weaknesses of the alignment.
Keywords/Search Tags:protein sequences pattern, pattern driven, fuzzy sequence searching
PDF Full Text Request
Related items