Font Size: a A A

Prediction And Application Of Shed Membrane Protein Based On Spark

Posted on:2017-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:L H WangFull Text:PDF
GTID:2180330482989823Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Membrane proteins are proteins that interact with biological membranes. Most secreted proteins are associated with membrane proteins. Moreover, membrane proteins play a potential role as receptors in various signaling pathways. Therefore, most membrane proteins are involved in modern medical and biological fields. As a summary of drug targets in modern medical, more than half of drug targets are membrane proteins. Previous studies have shown the release of the extracellular domain of a membrane protein secretome by proteolysis, and this process is known as “Ectodomain Shedding”. Only about 4% of membrane proteins will be released into secretome. In the process of Ectodomain Shedding, membrane proteins which locate at the bound of cell surface may be released and become part of secretome. Meanwhile, Ectodomain Shedding may affect a various kinds of molecules. This proteolysis process is mediated mainly by matrix metalloproteases(MMP) or disintegrin and metalloproteases(ADAM, which stands for ‘a disintegrin and metalloproteases’). For this reason, Ectodomain Shedding is related to many serious diseases, such as inflammation, cancer, rheumatoid arthritis and Alzheimer disease. Moreover, because secretome exists widely in the blood, saliva and urine, secretome with the shed membrane proteins could be easily accessible. But for the best of our knowledge, there still no a professional prediction model for this event, so it is necessary to have one professional tool for this shedding event.As the development of modern technology, the data of each field in bioinformatics present explosive growth. Enough data bring new opportunities to the related research, but the heavy data also required the more calculated performance, and the standalone compute platform is not enough for this challenge. Distributed computing platform, with its efficient operation efficiency, good scalability, and easy useage provides a new solution. In the article, we have the data with complexity feature elements, and the standalone platform will waste lots of time and resources. In the thesis, we have a prediction model of shed membrane protein based on Spark. This model can predict shed membrane protein more correctly and efficient. In the process of building prediction model, we collected enough datasets of membrane protein firstly, and the dataset of shed membrane protein was traded as the positive sample, the left dataset of nonshed membrane protein was traded as the negative sample. After all the membrane protein was initialized and chosen by the modified m RMR, we had the sorted list feature elements of the membrane protein. In the last, prediction model of membrane protein was builed by support vector machime(SVM) and based on Spark. In this process, we had the prediction performance evaluation of the predict model, and the feature elements with the best prediction performance was chosen. Results of related experiments show that many shed embrane protein were confirmed by the experiment, and some patients vivo characteristics of high expression. Our prediction model has a better performance in prediction, and the shed membrane protein what was predicted can be traded as the potential biomarker of many serious diseases. So our prediction model of shed membrane protein can play an essential role in clinical medicine and other fields.
Keywords/Search Tags:Membrane Protein, Ectodomain Shedding, Feature Selection, Support Vector Machine, Spark
PDF Full Text Request
Related items