Font Size: a A A

Research And Application Of Protein Classification Based On Deep Learning

Posted on:2019-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:L F ShaoFull Text:PDF
GTID:2310330563953936Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The antioxidant protein can repair DNA damage of human beings,thus it plays an important role in the treatment of cancer.So the classification of protein sequences is crucial for the prediction of oxidation in pharmacology.Since the implementation of the human genome project,protein classification problem has become an important branch of proteomics.Biological data increased exponentially every year,thus the identification of protein sequences by biochemistry experiment is very time-consuming.the development of new biological information on computer algorithm is efficient and has reliable means,besides learn to study protein classification problem,and predict the structure and function has more important and practical significance.As a new technology based on database,statistics and AI,data mining provides an unprecedented data analysis tool for biologists,and provides a powerful means for protein information analysis and extraction.This paper mainly introduces the application of deep learning method in data mining based on protein sequence classification.The main contents are as follows:1.The method of feature extraction based on the first order of protein is introduced.Protein sequence contains enough information to predict biological,physical and chemical properties of protein molecules,and the features extracted from them determine the best performance of subsequent classifiers.In this paper,we use the two peptide composition widely used in biology to describe protein sequence information.This feature extraction method does not need any other information,and has the advantages of simple and fast computation.It plays a decisive role in the performance of subsequent classifiers.2.A protein sequence classification model based on deep learning is proposed.Compared with traditional machine learning methods which rely on artificial engineering structural feature extractor,deep learning is a feature learning method,the original data through some simple but nonlinear transformation model become the abstract representation for classification.The first part of this model is feature learning composed of encoder and a fully connected network learning network.It learns abstract feature compression from the original feature vector,and then use the t-SNE method to do dimensionality reduction to transfer the learned feature to two-dimensional space,finally put the data into the SVM classifier to identify protein sequence.Experiments show that the model has high recognition effect of antioxidant protein.In the experimental data,the F1 value is 0.8842,MCC value is 0.7409,the accuracy rate is 97.05%,and the recall rate is 81.27%,which is superior to the traditional machine learning method.3.Based on the model proposed in this paper,an online antioxidant protein identification web service is developed.The service has the function of online predicting whether the amino acid sequence submitted by users is an antioxidant protein.In addition,it also provides data set download for this article,which is convenient for users to use and research.
Keywords/Search Tags:antioxidant protein classification, g-gap dipeptide composition, deep learning, autoencoder
PDF Full Text Request
Related items