Font Size: a A A

Research On Method Of Prediction Of Protein-Protein Interaction Using Intelligent Computing

Posted on:2011-12-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Q DuFull Text:PDF
GTID:1100360305972953Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the Human Genome Project successfully completed, the scientists got a lot of sequence information, and then the human from the era of research-based genome to the era of research-based functional genomics, it also be post-genomic era. Proteomics is an important branch of research in Post-genomic era, because in vivo the implementation of a variety of physiological functions was depended on protein and protein-ligand collaboration interaction.Proteins are one of the basic materials of the life, the interaction between the proteins not only plays a key role of functions of cells and biological pathways, and understanding of these interactions on the pathogenesis of various diseases and treatment also has a positive promoting effect. One of the most important challenge on proteomics research is how to large-scale understand protein-protein interactions from the physical level and structural level and build the corresponding protein-protein interaction network, the general common research method was based on its known protein primary structure and ligand sequences, extracted useful information from sequence, using experimental or computational methods combined these information to predict the possibility of interaction between proteins and the establishment of protein interaction networks. With the progress of X crystallography and nuclear magnetic resonance (NMR) experiment technological, a lot of protein structure data to be measured out, then these data further promote the development of data-driven based method (calculation method) to predict protein interactions.In this paper, we use computational intelligence algorithms to research some basic issues on protein-protein interactions. The main contents include protein interaction sites prediction on the micro-level, prediction of protein interactions on macro-level and protein-protein interaction network construction. For these three areas, we conduct in-depth analysis and put forward the corresponding predicting methods, the details are as follows: 1 Through the analysis on surface residues and interface residues of protein-protein interface, we introduce covering algorithm to protein-protein interaction sites prediction. This algorithm can work well with the clustering phenomenon of interface residues in the spatial structure and primary sequence, respectively. The first, interface residue samples and surface residue samples are conceived on a sphere of an n-dimensional space (through some form of conversion), then extracted two basic characteristics of the protein sequence:protein sequence profile and solvent accessible surface area. We use one of the interface residue samples as the center dot, and compute the minimum distance with heterogeneous samples, the maximum distance with the same class samples, then draw a circle using the half distance of the minimum and the maximum distance and construct a cover into cover the same class samples, and then the center is changed into heterogeneous sample, construct cover in the same way, so that alternately. According to the data characteristics, we constructed two experimental datasets (Complete dataset and Trim dataset), we design experiment on our method and the traditional machine learning algorithms (SVM, ME) in the two data sets. Experimental results show that the algorithms are effective and feasible on the results of the two data sets. Finally, we give two examples about protein interaction sites location based on several algorithms, and further shows this algorithm has strong adaptability and predictability for unknown protein interaction sites2 The features of predicting protein interaction sites are very many, different researchers use different features combinations to get different results. Because of various features provided different the information from different angles on the prediction of protein interaction sites, some of which features are useless on the classifier's predictive power at all, and may even reduce the predicting results. Therefore, we focused on feature selection of prediction of protein interaction sites, proposed a new feature extraction algorithm based on the combination of genetic algorithm (GA) and support vector machine (SVM). The algorithm extracted the relative importance 68 features from the original 110-dimensional vector of protein sequences using GA, and evaluated the extracted features by SVM. We will individual fitness function is set to Fl-measure, equilibrium value of the sensitivity and specificity, this will help to find out balanced the performance of the classifier with all parameters. We designed random classifier, two-stage classifier, SVM and GA/SVM classifier experiments. The experimental results showed that the proposed GA/SVM feature extraction algorithm is robust and made better performance than other methods and the original features.3 A key problem is how to effectively convert the protein sequence information about protein-protein interaction prediction, because different conversion methods of protein sequence information express different the amount of information, and resulting in different classification performance. So, we propose a amino acid order information method (pseudo amino acid composition, PseAA), this method not only take into account basic amino acid composition, but also short-range, medium-range and long-range interaction of amino acids in the protein expression. We use SVM to learn and classify it for new protein sequence coding scheme, while for the performance comparison with other methods, we designed three other conversion methods, such as correlation coefficient (CC), auto covariance (AC) and amino acid composition (AAC). Experimental results show that the classification performance is in the second by our proposed coding scheme of sequence. In the four methods, AC method produces the highest dimension, reaching 840; CC method followed for the 420; AAC method the lowest, only 40; our proposed dimension is 100. Therefore, from the performance and cost angle, the proposed protein-protein interaction sequence conversion method is effective and feasible.4 Protein interaction prediction is only from the protein level, but a variety of functions of life related to protein interaction network. Therefore, we use obtained classification model, and extract two types of interaction network data from the BioGrid database for testing. We convert protein sequences of interaction network into the corresponding discrete vector by pseudo amino acid composition method, and then use classification model to predict them, finally draw the map for the prediction results of protein interaction networks. Experimental results show that this method is also effective on the construction of the protein interaction network.
Keywords/Search Tags:protein-protein interaction, intelligent computing, covering algorithm, interaction network, pseudo amino acid composition
PDF Full Text Request
Related items