Font Size: a A A

Research On Biological Sequence Analysis And Rna-binding Protein Recognition Based On Sequence Feature

Posted on:2021-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:X GaoFull Text:PDF
GTID:2370330611999761Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of science and technology such as gene sequencing technology,the biological sequence data has increased dramatically and the growth of biological sequence data has promoted the tremendous development of many fields of biology.However,at the same time,many biological problems such as residue-level sequence analysis,RNA-binding protein recognition,and protein disordered region prediction are waiting for researchers to be explored.However,for large-scale sequence data,it is impossible to obtain good performance through traditional biological experiments.Therefore,it is an effective exploration way to f ind or design a novel analysis method for massive biological sequence data,which helps researchers to solve related problems by mining the hidden layers.Nowadays,the theory of artificial intelligence is perfecting,and the big data ecosystem is growing mature.More and more researchers will use machine learning technology and data mining technology to analyze the data potential when dealing with biological sequence related research problems.Traditional approches for the identification of RNA-binding proteins are using biological experiments.However,some methods are not accurate,and some methods are time-consuming and labor-intensive,which can not meet the needs of researches.Consequently,in this paper,we studied the sequence features of RNA-binding proteins,using several sequence feature extraction methods to explore sequence information,and combined machine learning algorithms to construct classfiers for research.The main contents of this paper as follows:We have proposed an important updated server covering a total of 26 features at the residue level and 90 features at the sequence level called Bio Seq-Analysis2.0 by which the users only need to upload the benchmark dataset,and the Bio Seq-Analysis2.0 can generate automatically the predictors for both residue-level analysis and sequence-level analysis tasks.To the best of our knowledge,the Bio Seq-Analysis2.0 is the first tool for generating predictors for biological sequence analysis tasks at residue level.For the residue-level analysis tasks,a sliding window approach was applied to extract the information of the sequential neighboring residues,and a sequence labelling model Conditional Random Field(CRF)was added into Bio Seq-Analysis2.0 so as to capture the global sequence order information of residues.Specifically,we also built Bio Seq-Analysis2.0 stand-alone package,and the experimental results of three problems indicated that the predictors developed by Bio Seq-Analysis2.0 can achieve comparable or even better performance than the existing state-of-the-art predictors.We anticipate that Bio Seq-Analysis2.0 will be widely used for biological sequence analysis at sequence level and residue level.Furthermore,this paper has proposed a novel predictor called i RBP-Motif-PSSM for identifying RNA-binding proteins by combining sequence information and collaborative learning strategy.The feature vectors are extracted by using the motif information and the evolutionary information extracted from the Position Specific Scoring Matrixes and several sequence feature extraction methods,and then the classfiers are constructed by support vector machine algorithm.In the end,a collaborative learning ensemble method was used to build an ensemble model.The experimental results showed that i RBP-Motif-PSSM outperformed other existing state-of-the-art methods for identifying NA-binding proteins,indicating that i RBP-Motif-PSSM is a useful tool for biological sequence analysis.
Keywords/Search Tags:biological sequence analysis, residue-level sequence analysis, RNA-binding proteins identification, Motif-PSSM feature extraction, collaborative learning
PDF Full Text Request
Related items