Font Size: a A A

Research On Protein Subcellular Location Method Based On Deep Learning

Posted on:2021-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z J CheFull Text:PDF
GTID:2480306557489704Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Research on protein subcellular localization is an important content of proteomics and a hot issue of bioinformatics.Images visualizing proteins in cells are commonly used for biomedical research,and it is important for studying the pathogenesis,drug design,and discovery of certain diseases,and these cells could hold the key for the next breakthrough in medicine.At present,there are two methods to study protein subcellular localization at image level: traditional machine learning and deep learning.Traditional methods based on machine learning need to extract features manually,which is time-consuming and hard to achieve automatic protein subcellular localization.At present,the methods based on deep learning are generally only for a specific cell type,which is not universal.For the problem of multi label image classification in protein subcellular localization,it is necessary to design a neural network for each subcellular separately,There are many subcellular structures in the cell,which makes the current method based on deep learning lack of flexibility.Based on this background,in view of the shortcomings of current protein subcellular localization methods based on deep learning,We study the multi tag image classification problem abstracted from protein subcellular localization problem,investigates and implements three classic deep learning methods to solve the multi tag image classification problem in protein subcellular localization It can not only solve the problem of subcellular label and label dependence,but also solve the problem of regional dependence of subcellular label and dye image,so as to improve the subcellular localization effect of protein.The main work is as follows:1)Research and implement three multi-label classification methods based on deep learning,including CNN?RNN,SRN and ml?GCN and apply them to the problem of protein subcellular localization,and propose improved CNN?LSTM to address the shortcomings of CNN?RNN,and use the memory unit of LSTM to simultaneously learn the image features of protein subcellular staining and subcellular label dependent features,so as to improve the effect of protein subcellular localization.Experiments show that the proposed CNN?LSTM is better than CNN?RNN?2)By analyzing the advantages and disadvantages of the three models CNN?LSTM,SRN and ML?GCN,as CNN?LSTM and ML?GCN only solve the label and label dependence in the multi-label classification problem.SRN only solves the dependence of labels and image corresponding regions in the multi-label classification problem.The advantages of the model are fused through the method of neural network multi-terminal input fusion,Two fusion models LSTM?SRN(fusion CNN?LSTM and SRN)and GC?SRN(fusion ML?GCN and SRN)are proposed.Experiments show that the two models proposed improve the protein subcellular localization effect.Among the two models,GC?SRN has the best effect and is superior to other methods on the same dataset.3)Aiming at the problem of the imbalance in the number of subcellular samples in protein subcellular localization,Focal Loss is used to improve the binary cross-entropy loss function to reduce the weight of easily classified subcellular samples,so that the model is more focused on the difficult to classify subcellular during training.Samples can also solve the problem of imbalance of training samples;model optimization for GC?SRN,including model quantization and model compression.By using INT8 quantization for the trained model,the inference speed of the model is greatly improved under the condition of less loss of model accuracy.At the same time,it is equivalent to adding regularization to the model,which avoids overfitting to a certain extent.By using channel pruning,the network channels in the model that are not important to the model performance are removed,and the model parameters are reduced,thereby improving the inference speed of the model and reducing the memory occupied by the model.
Keywords/Search Tags:Deep learning, Protein subcellular localization, Convolutional Neural Network, Multi-label image recognition
PDF Full Text Request
Related items