| Protein subcellular localization is one of the most important research tasks in proteomics.Following DNA transcription and translation that result in functional proteins,sorting signals direct proteins to reach the right subcellular locations where proteins can execute their functions.If one link breaks,pathological disorders will result.Therefore,protein subcellular localization is of great research significance.At present,wet-lab experiments and data-driven methods are the main methods to predict protein subcellular locations.Due to the benefits of saving capital,time,and effort,data-driven methods have been rapidly advancing in recent years.Specifically,the data-driven methods followed three steps: collecting and preprocessing protein data;quantifying protein data;and designing classifiers.The specific properties of the sample may be adequately described by quantitative methods,which is a crucial step in data-driven models.Although prior research has significantly aided in the theoretical advancement of protein subcellular location prediction,there are still certain issues.First,statistical operators rely on sample correlation and have a limited capacity to represent variations of intra-class and inter-class,and their supervision performance is poor;second,the feature gradient dispersion in the general deep learning model for representation learning is easily induced,which lowers the robustness and performance of abstract features in terms of discrimination;finally,numerous study techniques concentrate on a certain kind of protein signal,which restricts the scope of model research.In this paper,we developed three protein subcellular prediction models by analyzing crucial problems and employing residual units and attention mechanisms.First,HAR_Locator: a novel protein subcellular location prediction model of immunohistochemistry(IHC)images based on hybrid attention modules and residual units.Using IHC images as the research object,statistical features were extracted as shallow features using traditional statistical operators.Especially,a novel abstract feature model named HARnet was proposed based on hybrid attention mechanisms and residual units.Different depth feature maps from HARnet were concatenated with shallow features to enhance supervision.After the above process,the integrated space that includes shallow and abstract features successfully identifies the subcellular localization of proteins with the combination of a support vector machine and an artificial neural network.The experimental results showed that HAR_Locator reaches 84.73% accuracy and outperforms other baseline models.Second,IDRnet: a novel pixel-enlightened neural network for predicting protein subcellular location based on interactive pointwise attention.Among the architectures of IDRnet,using IHC patch images as the research object,the residual unit was used to release gradient dispersion as the trunk network,the spatial convolution module can improve information richness,and the interactive pointwise attention was developed to learn pixel-level characteristics of protein-target regions based on spatial convolution and an interactive algorithm.The experimental results showed that IDRnet reaches 84.73% subset accuracy in mixed-label datasets and outperforms other deep learning models.Third,dual-signal feature spaces map protein subcellular locations based on IHC image and protein sequence.Protein IHC images and sequences were introduced as research objects,and a benchmark dataset for label consistency was constructed.Multiple sub-classifiers are built by obtaining different features of IHC images and protein sequences.Finally,the voting mechanism and binary relevance were used to learn the protein subcellular locations of dual-signal.The experimental results show that the performance evaluation index of the dual-source signal model is better than that of the single-source signal model.The three design schemes developed in this paper have been experimentally confirmed to significantly enhance the attribution of the sample category,improve the performance of protein subcellular localization,and provide three effective automated prediction models for protein subcell localization research.This paper inherits and develops the theoretical knowledge of its predecessors,addresses the key issue of the weak specificity representation ability of IHC images,proposes a dual-signal protein scheme,and advances the theoretical basis for future multimodal fusion research. |