Font Size: a A A

Study On Semi-supervised Generative Adversarial Network Models For Predicting Protein Secondary Structures

Posted on:2020-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:X H ZhaoFull Text:PDF
GTID:2480306518466874Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of high-throughput sequencing technology,the use of sequence information to predict protein secondary structure has become one of the hot issues in computational biology research.Current machine learning methods use a large number of annotation samples to predict the secondary structure by constructing a supervised learning model.However,obtaining the annotation information of the secondary structure often requires a large amount of biological experiments and manual correction,which is a time consuming and costly task.This paper proposes a semisupervised generative adversarial network model which uses only less labeled samples to predict protein secondary structure.The main contributions include:(1)Firstly,the data set used in the experiment was cleaned,and the first-level structure of the data set was extracted.The position-specific scoring matrix of the primary structure was used as the data feature of the study.Secondly,in order to meet the conditions of semi-supervised learning during the training process.A vector(label_masked)is set for the training set,whose purpose of being to occlude the label of the secondary structure type of the labeled amino acid,so as to achieve semisupervised learning for training purposes.(2)Secondly,this study proposes a semi-supervised generative adversarial network deep learning model that uses less labeled samples to predict protein secondary structure.The principle of the model is built on the idea of generative adversarial network.The training data of the model is semi-supervised learning training using a large amount of unlabeled data,which avoids the tedious work of traditional supervised learning to predict the secondary structure of proteins.(3)Finally,the training parameters of the model are trained.The result is that the test set data of CullPDB6133 achieves Q8 prediction accuracy of ~70.2% and Q3 prediction accuracy of ~81.8% after training with missing data.The Q8 and Q3 prediction accuracy obtained on the independent data set CB513 reached ~66.2 and79.8%.
Keywords/Search Tags:Protein Secondary Structure, Generative Adversarial Networks, Semi-supervised Learning, Position-specific Scoring Matrix, Deep Learning
PDF Full Text Request
Related items