Font Size: a A A

Assessment And Preliminary Performance Improvement Of Cancer Driver Missense Mutation Prediction Methods

Posted on:2020-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:H Y SunFull Text:PDF
GTID:2370330575471071Subject:Biology
Abstract/Summary:PDF Full Text Request
As a complex genetic disease of humans,cancer is generally caused by many cumulative mutations in the human genome.Among all the accumulated mutations,only a few plays a key role in the development of cancer.These mutations are called driver mutations.The occurrence of driver mutations generally changes the molecular operating system of normal cells and stimulates the growth of tumor cells.Due to the instability of the cancer genome,the occurrence of driver mutations is usually accompanied by lots of passenger mutations,which are generally not involved in the development and treatment of cancer.Considering that driver mutation is a molecular marker for the diagnosis and prognosis of cancer,and is a target for the development or action of cancer-related drugs,it is very important to recognize driver mutations from numerous mutations in the cancer genome.Missense mutation is the most abundant mutation in the genome,and several strategies have been applied to the recognition of cancer driver missense mutation.Firstly,it can be identified through biological experiments,but these experiments are time-consuming and labor-intensive.As a result.it is difficult to process and mine the massive mutation data generated by many sequencing projects.Secondly,traditional statistical methods can be used for identifying driver mutations,but a mass of relevant tumor samples are required,which are generally difficult to obtain.Thirdly,based on the conservation of sequence sites,protein structure and functional attribute,several computational methods are developed to predict the driver mutations that have a functional impact on the occurrence and development of cancer.There are many computational methods for cancer driver mutations,and they are different in the design methods.The previous evaluations of cancer driver mutation prediction tools have showed clearly that different tools existed predictive bias and proposed the ensemble methods.However,they have not further analysed on the predictive bias.Based on the prediction results on the benchmark test set(cancer related,representative,non-redundant),we systematically evaluated the predictive power of the cancer driver mutation prediction tools.In the end,we constructed a cancer driver missense mutation prediction model based on high-quality negative samples.The detailed work is as follows.Firstly,the existing missense driver mutation prediction tools were analyzed and compared.The prediction results of multiple benchmark test sets showed that the cancer specific mutation prediction tool had poorer prediction performance for negative samples than the disease general mutation prediction tool.A total of 34 missense driver mutation prediction tools(including five conservation score prediction methods)were obtained,and the prediction performance of these tools were compared based on 6 benchmark test sets.Based on the assessment of cancer specific mutation prediction tools and general disease mutation prediction tools,cancer specific mutation prediction tools show lower comprehensively predictive power than general disease mutation prediction tools,mainly due to their poor ability of predicting negative samples and then need to be improved.Secondly,we constructed a cancer driver missense mutation prediction model CMMPred(Cancer Missense Mutation Predictor),which was based on the high-quality negative samples.The positive and negative samples of the CMMPred model are derived from the CHASM model training set and the dbCPM,respectively.With the CRAVAT tool,an 85-dimensional feature space was generated for the samples.Then,CMMPred was constructed by means of using the XGBoost algorithm.Finally,the AUC of CMMPred on the benchmark test set reached 0.77,Accuracy was 0.70,Sensitivity was 0.75,and Specificity was 0.66.CMMPred performed the most comprehensive predictive power than other tools,and is 7 percentage points higher on the AUC than the PolyPhen2 tool that follows.In summary,the results demonstrated that manually annotated high-quality cancer passenger mutations improved the effect of cancer driver mutation prediction.
Keywords/Search Tags:Cancer, Driver mutation, Passenger mutation, Missense mutation, Machine learning
PDF Full Text Request
Related items