Font Size: a A A

Bioimage-based Protein Subcellular Location Prediction And Protein Subcellular Translocation Detection

Posted on:2018-01-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Y XuFull Text:PDF
GTID:1360330590455266Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years,the human genome project enters the phase of analyzing gene functions,and proteomics as one of the core research contents is attempting to annotate the protein in cells.Subcellular location of a protein has a great correlation with its function and plays an important role in understanding protein metabolic activity,drug discovery,and disease diagnosis.Therefore,protein subcellular location recognition is an important aspect of proteomics research.Compared with traditional biological experiments and sequence-related studies,protein images can describe the distribution of proteins in cells more accurately.Therefore,many studies have begun to analyze subcellular positions based on biological images in recent two decades.Currently,most of these studies are based on fluorescence images,and the studies of immunohistochemistry images are relatively few.That is because the analysis of immunohistochemistry images is difficult due to their large field,high cell density,and less obvious texture features.However,the role of immunohistochemistry images in the diagnosis and differential diagnosis of lesions is irreplaceable.In this paper,we studied the distribution of subcellular location of proteins in immunohistochemistry images,and established classification models under supervised,semi-supervised and unsupervised learning respectively to solve a series of problems,and used the classification models to detect cancer biomarker proteins.Below are the main contents of this article:(1)Building a protein subcellular location predictor model iLocator that can recognize multi-location proteins based on immunohistochemistry images.Many subcellular location prediction models focus only on single-location proteins,which assume that one protein locates only at one subcellular location.However,studies have found that at least 30% of human proteins present in two or more subcellular locations.Therefore,we built the predictor i Locator that can handle both single-and multilocation proteins.The predictor added a new local texture features based on traditional subcellular location features,and used two multi-label classification algorithms,binary relevance and classifier chain.In our experiments,the i Locator has good performance in recognizing single-and multi-location proteins.(2)Designing a comparison criterion to screen cancer biomarker proteins based on iLocator predictive results.Some proteins would change their subcellular distribution in cancerous cells,and these proteins can be used as cancer biomarkers in clinical diagnosis.We used i Locator to predict the subcellular location patterns of immunohistochemistry images of normal and cancer tissues,respectively,and then designed criterion to compare the subcellular patterns in health and cancer images and quantify the differences.In this paper,many of the cancer biomarker proteins selected in the experiment have found the support literatures of biological experiments.(3)Proposing an incremental semi-supervised learning algorithm in order to solve the problem of small sample in training phase.The database human protein atlas where we select image datesets only has a quite small portion of high-quality images,so iLocator has a small sample problem in training process.To tackle this problem,we proposed an incremental semi-supervised learning framework that can selectively use relatively low quality samples in the training process to improve the classification accuracy and expand scope of application of the classification system.In order to improve the classification performance of multi-location proteins,we also propose a chain multi-label classification algorithm and a dynamic threshold criterion under the semi-supervised framework.The chain classification algorithm can use the correlation information between subcellular structures in model construction,and the dynamic threshold criterion can determine the category for each sample according to the distribution of the scores output by the i Locator and reduce the misclassification and leakage of the multi-label samples.(4)Quantifying distribution of multi-location proteins by using unsupervised topic model.In the human protein atlas,subcellular location annotations of immunohistochemical images are textual descriptions,which cannot make clear the amount of protein,especially for multi-location proteins.In this paper,we used the unsupervised topic model to model the subcellular position of the protein,and quantified the distribution fractions of protein in different subcellular structures.These fractions can help researchers detect and quantify the difference in the subcellular location of cancer-labeled proteins in normal and cancerous tissues.In addition,we also found that proteins in the same network tend to have similar subcellular distribution.Under this assumption,we used subcellular location distribution to identify potential new protein members for incomplete protein networks,and many of the new proteins had literature supports.
Keywords/Search Tags:Subcellular localization, Immunohistochemistry image, Multi-location protein, Location biomarker protein, Semi-supervised learning, Topic model
PDF Full Text Request
Related items