Font Size: a A A

The Construction Of Protein Mutation Site Database And Site Prediction Study

Posted on:2020-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y LiuFull Text:PDF
GTID:2370330575971070Subject:Biology
Abstract/Summary:PDF Full Text Request
As biological data are growing,researchers are beginning to use computers to analyze vast amounts of biological data.As one of the biological macro-molecules,protein has been a hot topic of research.And then,as protein structure analysis technology continues to mature,researchers have acquired a large number of protein crystals,and also provide biological data support for the study of protein-protein interactions.Protein-protein interactions have major impact on the integrity of life activities by controlling biological pathways.As we all known,the hot spot residues act as functional site for the protein-protein interaction interface and has regulatory function for the entire interaction process.In recent years,researchers have further investigated the role of protein-protein interactions in cellular life activities by resolving hot spot residues.In this paper,we first collect the relevant biological data of protein interactions,construct a dynamics and thermodynamics database of mutant protein interactions,and build an integrated machine learning auto-correlation model to predict hot spot residues at the interface of protein complexes.The specific research contents are summarized as follows:1.Construct a kinetic and thermodynamic database of mutated protein interactions.Based on the database collected by previous researchers,data was collected from the following two aspects.First,collect and integrate the previous database and get some data.These databases collect and store thermodynamic and kinetic data for mutant proteins,including SKEMPI,BID and AB-Bind.Secondly,thermodynamic and kinetic data of the newly added mutant proteins in the past three years was obtained by means of literature mining methods.When conducting a literature search,this paper considers two points.First,starting from the protein structure,the protein complexes are locked by searching for keywords,and these protein complexes are placed in the PDB-Bind database for comparison to obtain protein complexes with Kd values,and then the literature is obtained.The data that needs to be collected.Secondly,based on the published literature,the relevant literature published in the past three years was searched by keywords,and the thermodynamic and kinetic data of the mutant protein were obtained by reading the literature.Thus,5291 mutants were finally obtained,which were derived from 341 protein complexes.Based on the obtained mutation data,the thermodynamics and kinetics database of mutated protein interaction-dbMPIKT was constructed.Users can browse the website to browse the mutation data,query and download,and so on.In addition,a simple statistical analysis of the mutated data,using the cytoscape tool to create a protein interaction network,users can see the biological analysis of the mutation data in the file interface of the website.Therefore,the dbMPIKT database provides more comprehensive mutant data and updates the data for the past three years,making it easier for researchers to obtain mutant data.2.Construct an integrated learning auto-correlation classifier to predict the functional site-hot spot residue at the PPI interface.Based on the constructed thermodynamic and kinetic database of mutant protein interactions,the obtained data set was used to predict hot spot residues.First,based on the research of the relevant personnel,finally select five sets of data sets in the selection of the data set,including:ASEdb,BID,SKEMPI,dbMPIKT and the constructed mixed data set.Among them:ASEdb and BID are standard data sets for training and testing,and the other three sets of data sets are used as independent test sets.In order to increase the reliability of the model,the three data sets are integrated to obtain a data set with a large amount of data as an independent test set.Secondly,the auto-correlation function method was proposed,which can be used as the amino acid sequence coding.After screening the relevant factors on AAindex1,the physicochemical properties of 46 amino acids were obtained to characterize the amino acid sequence,and then the auto-correlation function was used to combine the sliding window to get the final feature.On the choice of classifier,the integrated classifier is constructed,and the support vector machine and K-Nearest Neighbor algorithm are combined to carry out the training and testing of the model,and the final prediction model is obtained.This paper constructs a biological database of mutant protein interactions and an effective prediction model to predict hot spot residues and predict well.This paper aims to study the data of protein interactions and prediction models of hot spot residues,and provide data foundation and research ideas for researchers of protein function related research.
Keywords/Search Tags:protein-protein interaction, thermodynamics and kinetic data, database, hot spot residues, ensemble learning
PDF Full Text Request
Related items