Font Size: a A A

Development And Application Of Human Extracellular Matrix Protein Prediction Tool And Reference Database

Posted on:2021-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:B H LiuFull Text:PDF
GTID:2370330614970433Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Extracellular matrix(ECM)proteins are important part of cell microenvironment.They can not only provide mechanical structure support for cells through interaction with other proteins,but also regulate cell functions through signal transduction.Dysregulation of ECM proteins' structure and function can lead to severe diseases such as osteogenesis imperfecta,achondroplasia,marfan syndrome,fibrosis,cancer and so on.In order to better study the pathogenesis of these diseases and explore potential diagnostic and therapeutic targets,it is necessary to conduct more in-depth studies on the composition and function of ECM proteins.Proteomics method can identify ECM proteins secreted into extracellular cells with high-throughput,and analyze a large number of covalent cross-linking and modifications of ECM proteins,which makes it a powerful technical platform for ECM proteins researches.On the other hand,construction of the ECM protein prediction tool and ECM protein reference database are required for large-scale ECM protein identification and characterization.At present,most studies related to development of ECM protein prediction tools and reference databases are independent,and there are some shortcomings with the existed tools and databases.The most significant shortcomings of these ECM prediciton tools are their lack of a connection with experimental biological features,especially concerning standard dataset construction and classification feature extraction.In addition,there are no tools available.Meanwhile,the major disadvantages of the ECM database is the relatively low overlaps between experimentally identified ECMs and theoretically predicted ones,additionally,the in silico ECM reference database was constructed via a semi-empirical and manual-assisted approach,so there are some difficulties for the database in dealing with problems of constant updating and expansion to other species.To solve problems metioned above,in this study,we proposed incorporating these advantages of both existed ECM prediction tools and database,and developed ECMPride,a flexible and scalable tool for predicting extracellular matrix proteins.In addition,a comprehensive human ECM reference database ECMPride DB and its web-based application ECMPride DB-Web were conctruced by applying ECMPride to all human protein sequences in the Swiss Prot database.There are mainly four parts of this thesis,and the details are shown as follows.a)Most existing ECM protein prediction tools were developed with a generic pipeline,including standard dataset construction,feature extraction,feature selection,model construction and evaluation.According to this pipeline,we firstly introduced and summaried key and investigated and analyzed the useful experience and problems to be solved in the existing ECM protein prediction tools.Furthermore,we summarized the construction principles of ECM protein prediction tools and the corresponding solutions for each problem to be solved.Finally,we analysed the reproducibility of these tools and reproduced Ecm Pred by R language.b)Based on the investigation results of existing ECM protein prediction tools,we proposed a flexible and scalable tool ECMPride for predicting extracellular matrix proteins.ECMPride was established by incorporating the advantages of more credible standard dataset,experiment-based feature and robust prediction models.ECMPride can be downloaded for free and is the only available ECM protein prediction tool at present.It has good sensitivity and balanced accuracy,and achieves better prediction performance than Ecm Pred.c)By applying ECMPride to all human proteins,we established a reference database of human ECM proteins,ECMPride DB.ECMPride DB covers most known ECMs in Human Matrisome,and provides more potential novel ECM candidate proteins.Furthermore,we established the web-based application of ECMPride DB,ECMPride DB-Web.ECMPride DB-web supports single search,batch search,single download,and batch download,which will contribute to the research of ECM proteome.d)To verify the reliability of ECMPride DB,this reference database was applied to analyze the the proteomics dataset produced by a published ECM experimental study.First,the Max Quant software was used to process the mass spectra raw files and generated a list of protein identifications.Then,ECMPride DB was applied to match these proteins with all putative human ECM protein,and identified both known and potential novel ECM components.Finally,these potential novel ECM components were verified by detailed biological function annotations and protein-protein interaction analysis,additionally,several top scored novel ECM components are verified with immunohistochemistry and immunofluorescence experiments.In this study,we focus on the improvements of human ECM protein prediction tool and reference database.We developed ECMPride,a flexible and scalable tool for predicting extracellular matrix proteins.ECMPride can achieve excellent performance in predicting ECM proteins,with a relatively good balanced accuracy and sensitivity.Also,a new ECM reference database ECMPride DB and its web-based application ECMPride DB-Web was established,containing all putative human ECM components as well as abundant biological annotations.This reference database covers most known ECMs in Human Matrisome,and more potential ECM proteins can be identified when using this dataset to annotate the experimental proteomics datasets.In a word,the ECMPride,ECMPride DB and ECMPride DB-Web can serve as valuable tool and resources for future ECM-related investigations.
Keywords/Search Tags:Extracellular matrix, Proteome, Prediction tool, Database, Random forest
PDF Full Text Request
Related items