Font Size: a A A

Study On The Methods For Predicting The Related Issues Of Protein Function

Posted on:2016-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhaFull Text:PDF
GTID:2180330461992499Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the smooth completion of human genome project and the wide application of new efficient experimental technique, massive information about genetic sequences were generate. Thence, the research on life science officially entered the post-genomic era. Proteins involve in all aspects of life activity and protein research mainly focuses on the analysis of protein structure as well as protein function based on the protein sequence. Protein structure and its function is relational. Consequently, a protein that can show specific function is decided by its unique structure. The basic process of life is a system effect of the proteins with different functions under certain condition. Protein interaction is almost ubiquitous in organisms. Therefore, the study of protein structure class and protein-protein interaction is not only beneficial to know its function, reveal the essence of life activity, understand relevant biological function and mechanism of biological process, but also helpful to analyze the related disease and develop drug treatment.It is essential and urgent to study the issues on protein function, where research on prediction method of protein structure class and protein interaction is a hot and difficult point. Traditional biological experiment methods can’t meet the demand of large-scale testing. Researchers solve the shortcoming of traditional method from a computational point of view in a certain extent, and speed up the determination speed greatly. Nevertheless, there are still some problems.This dissertation presents two novel methods for prediction, one is based on the sequence information, and the other is based on the multi-instance learning. In the research of prediction method of protein structural class, this dissertation focuses on reflecting the most real protein structure as much as possible by constructing a new feature coding, not just the composition information of sequence. In the research of predicting protein-protein interaction, the prediction is made through the uncertainty of the sample label in multi-instance learning based on domain information directly.The main research works in this dissertation are as follow:1. The computational methods of predicting protein structure class and protein interaction are classified and summarized. In addition, the theoretical basis of different methods and their pros and cons are elaborated briefly.2. This dissertation put forward a forecasting method of protein structure class based on autocorrelation coefficient and Pseudo-amino Acid Composition from the perspective of protein sequence feature construction. Previous feature vector construction methods based on sequence generally only consider the proportion of twenty kinds of amino acids in the protein sequence, and take more composition information into consideration. Consequently, it is easy to neglect its marshaling sequence and coupling information. By two kinds of sequence coding methods consisting of autocorrelation coefficient and pseudo amino acid, the position information of amino acids in the sequence can be reflected. Meanwhile, the interrelationship of the amino acids at different distances from each other within the sequence is considered and the structural information of proteins can be truly reflected. According to a host of comparative experiments on recognized data sets and self-constructed data set, the accuracy of proposed method can improved 14.49%% 8.33% and 2.78% than the traditional amino acid composition method. It is shown that the new method can improve the accuracy of the prediction.3. This dissertation propose a novel method of predicting protein interaction based on the domain information and multi-instance learning method. The conventional method based on domain generally needs to ascertain which has interaction with the domain in the known interactional proteins and make prediction about the unknown protein interaction. Nevertheless, it is often difficult to obtain the detailed information in reality. As for these problems, given that the package has mark while the example in package doesn’t have mark, this paper utilizes multi-instance learning thought to regard protein pair as package and regard each domain pair in protein pair as example, thus there is no need to foreknow which domain is interactive. This method blurs the problem that whether certain example will result in protein interaction and simplifies the steps of traditional domain method. In the self-constructed data sets, many experiments are performed by multi-instance learning algorithm and commonly machine learning algorithm respectively. The experimental result shows that the method is effective.
Keywords/Search Tags:Multi-instance Learning, Protein Structure Class, Protein-protein Interactions, Autocorrelation Coefficient, Pseudo-amino Acid Composition
PDF Full Text Request
Related items