Font Size: a A A

Application Research On Prediction Of Protein Function Using Deep Learning

Posted on:2021-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiuFull Text:PDF
GTID:2480306197995829Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Proteins are organic macromolecules with spatial structures that carry the main life activities of cells.In protein sequences,the arrangement of different amino acid is the key factor which lead to different functions of proteins.Studying protein functions by the analysis of protein sequence data has important theoretical and practical significance.At present,it has become an important research direction to predict the functions of proteins with the efficient computing power of computers.This work is aimed to perform protein functions prediction by analyzing the characteristics of protein sequences and constructing different deep learning algorithms for several scientific research problems.Major research contents are summarized as follows:(1)Prediction model of protein solubility based on double convolutional neural network.The determination of protein solubility is an important research topic has profound influence in industrial production and biological theory.In this study,the model with a deep dual channel neural network and a feature extraction method that fuses global and local features are designed by analyzing the protein solubility data.First,by extracting a mixture of global and local features for each sample,and using the SCRATCH tool to extract 57-dimensional features and 21 sequence features as additional features.Then,according to the characteristics of the data,a deep dual-channel neural network model is designed to realize the protein solubility prediction by adjusting the combination of the serial and parallel structure of the convolutional layer.The model is consisted of a channel with 3 layers of 2D convolution operations and a channel with 1 layer of 1D convolution operations,which are used to calculate the hidden relationship of the mixed matrix group and additional features,respectively.It is shown that the method we proposed in this work has superior performance,by designing a variety of comparative experiments.(2)Prediction model of drug protein affinity based on word frequency dipeptide frequency coding and mixed graph convolutional neural network.Proteins are often viewed as identification targets in the development of drugs for disease treatment and virus suppression where protein-drug affinity is an important reference index.However,the performances of current deep learning algorithms which are used to calculate the affinity of proteins and drugs are not high.In order to deal with this problem,a drug protein affinity prediction model based on word frequency dipeptide frequency coding and mixed graph convolutional neural network is established in this work.The peptide frequency characteristics are improved by using word frequency characteristics of natural language,so that protein sequences can be expressed.And the graph structures of drug molecules are constructed with drug atoms which are represented by five different characteristics,and the bond relationship between atoms which are mapped to the edge set.The obtained protein features and graph structure are used as inputs to the convolutional neural network and graph convolutional neural network,respectively.By calculating the hidden relationship,a prediction model is established to realize the prediction of drug affinity.This method provide a higher validity than the existing research methods.(3)The system of protein function prediction is designed based on the protein solubility prediction model and drug protein affinity prediction model.The technology of packaging and multi-threading is used,and scientific computing libraries are used such as Qyqt5,Numpy,Sklearn,Tensor Flow,Pytorch,PytorchGeometric,etc in the development environment of Python.Modular programming is used in process of the system development.Hierarchical and distributed design is used for this system.And it is divided into 3 levels and 11 functional modules,which can realize 19 different functions.And the fault-tolerant mechanism is designed to respond for errors which the user's possibly wrong operat.The system provides a good and stable operating environment for users to train and predict protein performance and drug protein affinity.The system dynamically displays the process of training in the training function module,which displays various evaluation indicators of the prediction model.The training process and results have a high degree of visualization in the system.The prediction system has been proven to have good stability and reliability by trial tests for every functional module.
Keywords/Search Tags:Protein Function, Protein Solubility, Convolutional Neural Network, Drug Target Affinity, Graph Convolutional Neural Network
PDF Full Text Request
Related items