Font Size: a A A

Predicting S-sulfenylation Sites Using The Machine Learning

Posted on:2019-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:G C LeiFull Text:PDF
GTID:2370330593951051Subject:Computer technology
Abstract/Summary:PDF Full Text Request
This paper focus on the use of computational methods to predict the protein modification sites,as well as in this regard we have done some of the situation.In the past few years,many people have done a lot of effort in predicting the post-translational modification(PTM)in proteins.The protein modification plays an important role,for example,by changing the side chain PTM(post-translational modification)of the peptide to play a very important role in many biological processes.They may affect protein subcellular localization,functional activation,turnover,and interaction with other molecules.They are also associated with many complex diseases such as Parkinson's disease,Alzheimer's disease and some cancer-related diseases.It is great significance and function to study the post-translational modification of protein.In this paper,we mainly do the research and prediction of the S-sulfenylation sites,which consider the work done by the predecessors.Since the modification of this modification site is difficult,the sample size less,in the prediction of the accuracy is not very high,so we through the following two aspects of the corresponding treatment,to obtain satisfactory results and the corresponding accuracy.(1)In our article,we first analyze the methods of feature extraction,which are expressed by predecessors for protein sequences,especially short text sequences,including the way of extracting features through physical and chemical properties.Through the binary coding method for feature extraction,through PSSM profiles to extract the feature and the method of location coding sequence of specific amino acid and so on.And the above-mentioned feature representation method is feed into the machine learning model to carry out the corresponding test and forecast,including the random forest,SVM(support vector machine),logical regression and other machine learning algorithm to predict the protein modification sites.(2)For the previously mentioned the method and the accuracy of the problem we mainly in this article from the following aspects of the corresponding improvements and innovation.First,considering the limited the amounts of samples that can be obtained by experiment of proteins,it is difficult to obtain effective information indicating the sequence in the short text,and after the feature representation,the short sequence features of repetition after presentation is larger than that of the long sequence,so the accuracy of the sample is improved by selecting the number of samples.Secondly,for the representation of the feature,the short-text sequence is represented by the physical and chemical properties of the protein amino acid and the feature selection method is used to improve the prediction accuracy.
Keywords/Search Tags:S-sulfenylation Sites, Physicochemical Properties Difference, Machine Learning, Short Text Sequence
PDF Full Text Request
Related items