| The structure of protein is a prerequisite for its biological function.Understanding the relationship between the structure and function of a protein is of great significance to biology,medicine,and pharmacy.However,at present,the number of known protein structures is much smaller than the number of known sequences,and this gap will be further exacerbated because of the complexity of structure determination.Therefore,predicting protein structures through its sequences theoretically has received extensive attention.The secondary structure serves as a bridge between the sequence and the tertiary structure,and an effective protein secondary structure prediction method is an available mean and basis for studying protein structure and its function.By combing the depth of the existing researches,this thesis integrates the research of sequence information representation and prediction model,and gives a new deep learning scheme for protein secondary structure prediction.On this basis,try to further improve the prediction effect by using the relations between three-state and eight-state classifications.This research could mainly be concluded into the following three aspects:Firstly,a novel deep learning model called MSTCNPP is proposed.This model uses temporal convolutional network to simultaneously complete the processing of short-range dependence and long-range information,and borrows the attention mechanism to more accurately capture the interaction between residues in the protein sequences.When treating PSSM and orthogonal coding as input features,it can correctly achieve 70.6% Q8 accuracy on the CB513 data set,and greatly improves the parallelism of the network by virtue of the full convolution architecture.In addition to innovations in predictive models,we also conducted in-depth research on the feature representation.After fully considering the potential influencing factors in the formation of the secondary structure,two sets of new features were constructed uniquely,and a more comprehensive and effective feature representation method was studied.After introducing physic-chemical properties and log-relative probability features,the Q8 accuracy of MSTCNPP renewed to 70.60%.Furthermore,we attempt to use the relations between two classification criteria of the three-state and eight-state secondary structure,which could be viewed as vital rules and constraints,to further improve the prediction effect of the secondary structure.Through experimental analysis,the relation-aware method multi-output simultaneous prediction shorted for MOSP can give more reasonable prediction results,and the average gains of Q3 and Q8 on the four data sets are 0.73% and 0.48%,respectively.After synthesizing the findings of the above three aspects,the Q8 accuracy of 70.74% and Q3 of 83.68% was finally achieved on the CB513 data set,which reached the best performance in the currently known non-ensemble prediction methods.Importantly,the proposed methods could be easily transplanted to other models,exhibiting strong feasibility and practicality. |