Font Size: a A A

Research On Prediction Of Body-Fluid Proteins Based On Multi-Task Deep Learning

Posted on:2024-02-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:K HeFull Text:PDF
GTID:1520307178996859Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Body-fluid biomarkers can be detected through non-invasive or minimally invasive ways and used for early diagnosis of diseases and monitoring of human health.Therefore,they have important research values.Currently,research on body-fluid biomarkers has achieved some success.Many biomarkers have been successfully applied to the clinical diagnosis of human diseases.However,body-fluid biomarkers with high specificity still need to be discovered.Body-fluid proteins are important resources of body-fluid biomarkers.Therefore,the detection of body-fluid proteins can help the identification of body-fluid biomarkers.Meanwhile,experimental detection is still difficult due to the complexity of human body fluids and the high cost of biological experiments.Bioinformatics-based computational methods can predict body-fluid proteins and further promote the research on body-fluid biomarkers.At present,many prediction methods have been proposed.These methods can be divided into two categories:machine learning(ML)-based prediction methods and deep learning(DL)-based prediction methods.ML-based prediction methods are simple and fast,but their prediction accuracy is limited.The DL-based methods usually have higher prediction accuracy,but they suffer from the overfitting problem.In addition,there are the following limitations.Firstly,the body fluids studied are not comprehensive,and there are still some body fluids that have not yet been studied.Secondly,most of these methods study body fluids individually and cannot exploit the relationships between different body fluids.Finally,most of these methods are based on supervised learning,which require manual negative datasets.The manual datasets limit the improvement of the prediction methods.Because of the above research limitations,this paper mainly focuses on body-fluid protein prediction based on deep multi-task learning.Specifically,this paper combines DL,multi-task learning(MTL),and positive-unlabeled(PU)learning to propose three body-fluid protein prediction methods.Firstly,this paper focuses on the research of supervised learning algorithms,and a multi-task deep learning prediction framework is proposed based on the multi-task learning method for the study of multiple human body fluids;Subsequently,to overcome the difficulties in obtaining negative samples,this paper begins to focus on the research of semi-supervised learning algorithms,and a semi-supervised prediction model is proposed based on the PU learning method for the prediction of cerebrospinal fluid(CSF)proteins;Finally,this paper further improves the semi-supervised prediction model.A semi-supervised multi-label prediction model is proposed for the comprehensive semi-supervised study of multiple body fluids by combining multi-task learning.The main research contents are as follows:1.A supervised deep learning multi-label prediction method,named Multi Sec,is proposed for the prediction of body-fluid proteins.The method is a supervised learning algorithm.Therefore,it needs to generate negative samples for the study of17 human body fluids.This method conducts a comprehensive study of common human body fluids,and 5 body fluids are studied for the first time.In addition,this method is based on multi-task learning and can utilize the relationship between different human body fluids to further improve the prediction accuracy.Firstly,Multi Sec designs a lightweight Convolutional Neural Network(CNN)and shares the feature extraction part of CNN on 17 body fluid prediction tasks,which significantly reduces the overfitting problem of the prediction model.Then,a balanced sampling strategy is also adopted to dynamically generate balanced training data,which overcomes the imbalance problem of body fluid datasets.Finally,the CNN is trained by a multiple-gradient descent algorithm(MGDA),which effectively prevents the conflict problem between different body fluid prediction tasks.This paper trained and evaluated Multi Sec on the data of 17 human body fluids.Multi Sec achieved an average AUC of 93.13%.Experimental results show that Multi Sec not only improves the prediction accuracy of body fluid proteins but also has lower algorithm complexity.2.A semi-supervised prediction method is proposed for the prediction of CSF proteins.This method is based on semi-supervised learning and can model CSF proteins without generating negative samples.CSF is an important source of biomarkers for central nervous system diseases because it can reflect physiological changes in the human brain better than other body fluids.Therefore,studies on CSF protein prediction are of great research value.First,this method collects a large number of common features of proteins as the input of the prediction model.Then,important features are selected through rank sum test,false discovery rate,and recursive feature elimination methods.Finally,the CSF proteins are modeled through DNN and ensemble learning.Experimental results show that the accuracy of this prediction method is better than other prediction methods.In addition,this paper also applied the prediction results to the biomarker discovery for glioma and successfully identified the potential biomarkers of glioma in CSF.3.An improved semi-supervised deep learning multi-label prediction method,named MPUSec,is proposed for the prediction of body-fluid proteins.This method is a semi-supervised algorithm.Therefore,it can model 17 kinds of body-fluid proteins without generating negative samples.Firstly,an improved CNN network is designed,which not only reduces the amount of convolutional parameters but also extracts multi-scale protein features.Then,a single-label prediction method is constructed for the semi-supervised study of body-fluid proteins.To this end,the semi-supervised prediction task is converted into multiple binary classification tasks and solved through a multi-task deep learning method.Finally,the study of single body fluid is extended to the study of multiple body fluids.A semi-supervised multi-label prediction method is proposed.To this end,prediction tasks of multiple body fluids are converted into multiple groups of binary classification tasks,and a two-stage multi-task CNN network is designed for the study of multiple body fluids.This paper trained and evaluated MPUSec on the data of 17 body fluids.The experimental results show that MPUSec is significantly better than other comparison methods.In summary,this paper proposes several body-fluid protein prediction methods based on multi-task deep learning.Firstly,This paper focuses on the study of supervised learning and proposes a supervised multi-label prediction method to model body-fluid proteins from 17 human body fluids.Then,this paper studies the proteins in single body fluid through semi-supervised learning and proposes a semi-supervised prediction method to model CSF proteins.Finally,a comprehensive study is carried out through semi-supervised learning on 17 human body fluids.A semi-supervised multi-label method is proposed to model body-fluid proteins from 17 body fluids.Experimental results show that the prediction methods proposed in this article have high prediction accuracy and can predict more accurate results for 17 human body fluids.
Keywords/Search Tags:Deep Learning, Multi-Task Learning, Positive-Unlabeled Learning, Body-Fluid Secreted Proteins, Body-Fluid Biomarkers
PDF Full Text Request
Related items