Font Size: a A A

Protein Domain Prediction Using Recurrent Neural Networks

Posted on:2008-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:S P LingFull Text:PDF
GTID:2120360218957811Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Domains are basic structural units upon which structural classifications are built and functional assignments performed. Domain decomposition is performed well by human experts; however, automatic predictive methods are vital in the post-genomics era, since the flood of new structures simply overwhelm the capacity of human experts. Automation domain prediction can be implemented by template-based method and de novo method. Template-based methods have achieved great success in domain assignment; however, they failed to assign domain in the absence of structural (template) information. So,de novo of predicting domain from only sequence information is an important problem in structural biology and sequence analysis.Domain boundary prediction is usually the first step in most De novo methods. Predicting domain boundary from only sequence information can be looked as a standard binary classification problem in pattern recognition. After that, many statistical machine learning (ML) methods including Hidden Markov Model, Neural Networks, Kernel Machine have been applied to this problem by integrating all kinds of attributions about amino acids sequence into discriminative model. BRNN-based domain prediction method—Dompro is most prominent one of these ML-based methods. However, it's the overall success rate is approximately 69% in predicting the number of domain because of the limitations of algorithmThis thesis proposed a novel model for predicting domain boundary based on Long Short-Term Memory (LSTM) recurrent neural networks—IPSP-LSTM. The IPSP-LSTM integrates the advantages of traditional LSTM and Extended LSTM with Forget Gate (FG-LSTM) taking characteristics of protein sequence into account. The network can model long rang dependency in learning a protein sequence without local input window (window size is 1). The empirical research showed it can achieve higher overall accuracy (79%) in predicting the number of domain than previous ML-based (MLP, SVM, BRNN) de novo method. And it can achieve more balanceable effect than Dompro in sensitivity and specificity in predicting the domain number of multi-domain chain. It is worth mentioning that IPSP-LSTM can alleviate the learning difficulties of"dimension disaster"in the process of enlarging the local window.
Keywords/Search Tags:bioinformatics, protein domain prediction, machine learning, recurrent neural networks, Long Short-Term Memory
PDF Full Text Request
Related items