Predict Proteinnucleotide Binding Site By Using Improved AdaBoost And KNN Algorithem

Posted on:2016-10-01

Degree:Master

Type:Thesis

Country:China

Candidate:X Xin

Full Text:PDF

GTID:2180330482954845

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the continuous development of computer and network technology, the human officially entered the era of big data, like all other disciplines, computer science and technology also has inestimable influence to biology. And with the coming of the post genome era, protein sequencing technology rapid development,which creates a protein sequence data of explosive growth. However compared to protein primary sequences, protein structure and function of the information for the human has a more important role, for the information of cognitive greatly promote the biology, the rapid development of life science and pharmaceutical engineering, and other fields. Thus, many researchers have devoted to the spatial structure of protein and protein function. Early method based on biology because of its great time cost and the economic cost has not fully meet the needs of development, therefore,bioinformatics arises at the historic moment. The researchers began by computer method to predict protein structure and function, and achieved gratifying achievements in this field.It is well known that proteins in the body is not exist in isolation, it need to via the interaction with other particles to achieve a specific function. The interaction of particles with proteins we collectively known as the ligand, As a kind of important ligands and nucleotides has its distinctive characteristics. Obvious, understand the interaction mechanism of protein- nucleotides to further understand the protein function play an important role, Therefore, judge protein- nucleotide interaction sites has become a very hot research topic in recent years.KNN classifier is an ancient and practical, it has high robustness and stability, are widely used in machine learning and data mining areas. The basic idea is found in many training samples and sample under test "closest" K samples, and through these samples type distribution to determine the classification of the sample under test results. In biology, the research proves that the more similar the protein sequence, themore likely it has similar structure and function. KNN, therefore, this method is simple and intuitive in protein- nucleotide binding site prediction has achieved quite competitive results. KNN algorithm, however, there are still serious shortcomings,namely under the condition of the sample distribution is not spread evenly over the prediction performance significantly decreased. The protein- nucleotide binding site of sample tilt data has a very serious problem, a tremendous difference in the number of positive and negative samples. According to this problem, an A- KNN algorithm is proposed in this paper, based on the AdaBoost algorithm A- KNN undersampling was carried out on the training set, Form N weak training set, using the improved KNN algorithm are constructed on each weak training set N weak classifier, Then the weak classifiers integrated to become a strong classifier, and form the final prediction results.Experimental results show that A- KNN compared with the original KNN algorithm in accuracy and MCC indicators have made significant improvement. And in the case of artificial add noise data, our algorithm reduces the noise data effects the result of the classification. In comparison with the algorithm is good, we A- KNN in accuracy and MCC on two indexes were improved. A lot and is validated by the test specification of our method can effectively improve the prediction of protein-nucleotide binding site.

Keywords/Search Tags:

Protein, Nucleotides, Ada Boost, KNN, Sample tilt

PDF Full Text Request

Related items

1	Research On The Effects Of Sample Tilt On Nanoindentation Test Of Berkovich Indenter
2	Study Of The Interaction Of Native DNA And Nucleotides With Compounds Of Germanium
3	The Protein Prediction Of Folding Structure Based On The Content Of Pyrimidine Nucleotides In The MRNA
4	Studies On The Antioxidant Ability Of Nucleotides And Their Effects On Antioxidant Genes Of Mice
5	Study On The Protective Effect And The Mechanism Of Heterology Prime-boost Immunization With Streptococcus Pneumoniae Vaccines
6	Study On Tilt Principle And Correct Orientation Method For Three-Dimensional Laser Scanning Point Cloud
7	Development Of Biological Sample Processing Equipment And Ground Verification
8	Rapid Collecting And Processing Of Ground Sample Data And It's Expand Application
9	Study On The Dynamics Of Boost Converter Based On Chaotic Control Method
10	Three Dimensional Modeling And Precision Analysis Of Multi Tilt Rotor Single Shot UAV Tilt Photography