Font Size: a A A

Research On Model Of Protein Disorder Structure Prediction

Posted on:2019-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:J E ChuFull Text:PDF
GTID:2370330563993039Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Disordered structures are special structures of proteins.They are lack of stable spatial structure and show flexibility under the natural conditions.But they have essential functions.Disordered structures have a broad occurrence in living organisms,especially in eukaryote,27%-41% of its proteins have disorder structure.The study of disordered proteins has great significance,such as understanding protein folding process,identifying protein structure,designing new artificial proteins and making drugs.Traditional experimental methods used to determine protein disordered structures are time-consuming and requiring significant costs,although they have high accuracy.Therefore,the bioinformatics methods based on machine learning are also used for protein disordered structure prediction.In recent years,with the explosive growth of biological data,including protein structure,a large scale of data has been formed.However,the datasets used to train machine learning model are of small size generally.We use tons of data and distributed SVM method to train the model,and an improved model structure is used to solve the problem of SVM method for long disordered region.The main works of this paper are as follows:(1)Screening and obtaining original structure data based on PDB protein structure database.(2)Extracting and marking the protein sequence from the structure data and acquiring the features.(3)Getting the sample set and training the standard model,the small dataset model,the sample balanced model and two layers model.(4)Using cross validation and independent verification to analysis and evaluate the prediction of the model.According to the result,the method used in this paper has a certain improvement compared with other similar methods.Therefore,the method proposed in this paper is reasonable and feasible,and the research of bioinformatics methods based on big data is meaningful and necessary.
Keywords/Search Tags:Protein, Disordered Structure, Machine Learning, Support Vector Machines
PDF Full Text Request
Related items