Font Size: a A A

The Research And Implementation Of Classification Algorithm Based On Protein Primary Structure

Posted on:2022-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y TaoFull Text:PDF
GTID:2480306311453354Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The structure and properties of proteins are very important for understanding all kinds of life activities.As a key work before further study of protein function and physicochemical properties,protein class recognition has been a research hotspot in the field of life science in recent years.At present,with the continuous improvement of computer hardware performance and the development of artificial intelligence and other related technologies,using computer programs to extract and identify high-throughput protein sequences has become a more common research method.At present,the research on protein recognition tends to use a large number of feature fusion methods to expand the number of sample features on the one hand,and constantly use more complex classification algorithms on the other hand,which not only pursues the improvement of recognition accuracy,but also brings the disadvantages of cumbersome recognition process and high operation cost.In the process of research,scholars are more used to use a variety of feature extraction methods to extract features from samples,in order to get as many features as possible and integrate them together,but less explore whether using a single feature extraction method can extract sufficient information from samples and achieve reliable recognition of samples.In view of the above problems in the field of protein recognition,this paper has carried out in-depth exploration and completed the following research contents:Firstly,in view of the problems of large amount of computation and high experimental cost in current recognition methods,we takes vesicle transporter as the research object,uses CTDC method for feature ex.traction,and further combines with MRMD dimension reduction algorithm to reduce the dimension of feature space as far as possible,uses support vector machine as the classification algorithm,and uses LibSVM toolkit to obtain the necessary parameters c and g,then a lightweight recognition model with simple structure,low experimental cost and easy operation is constructed.The model only uses 21 features and completes the task of identifying whether the protein sample belongs to vesicular transporter.The Recognition accuracy of test set is 72.16%.Secondly,we take the position specific scoring matrix(PSSM)of protein as the only sample feature.In order to solve the problem that the dimension of feature data is different due to the unequal length of protein sequence,We take adaptor proteins as the research object and design the average amino acid algorithm,using CNN as the classifier to construct the recognition model and verifies the feasibility of the average amino acid algorithm and uses the category weight parameters in the network to solve the problem of sample height imbalance in the process of model training.In addition,in order to fully explore the classifiability of protein samples when the PSSM is used as the characteristic information,a convolutional-recurrent neural network is designed,which links the convolution layer of CNN network,GRU network and ordinary full connection network layer,and uses the network to identify the adaptor proteins in protein samples.The experimental results show that the classifier can effectively recognize the samples.The final recognition accuracy of CNN as a classifier is 68.9%,and the final recognition accuracy of convolutional-recurrent neural network designed in this paper is 74.5%.Finally,we constructs a protein recognition system based on the convolution-recurrent network designed and used in the experiment.This system will automatically compare the protein sequence data uploaded by users and obtain its PSSM.Then the average amino acid algorithm is used to process,and finally the user can get the recognition model by uploading the data set for training model or use the trained recognition model to recognize the samples.Our system realizes the functions of user online training classification model,online sample recognition and result output,which can provide convenience for the research of related researchers.
Keywords/Search Tags:Protein classification, bioinformatics, CNN, GRU, LibSVM
PDF Full Text Request
Related items