Cancer-related Gene Identification Method Based On Protein Network And Random Forest

Posted on:2022-10-06

Degree:Master

Type:Thesis

Country:China

Candidate:R Zhao

Full Text:PDF

GTID:2504306728986749

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Cancer is a major threat to human health today.In the past few decades,many investigatorshave devoted themselves to studying the main pathogenic factors of cancer.For different types of cancer,some related genes have been discovered.However,there still exist hidden genes,which are waiting for us to discover.The unique topological structure of the protein interaction network has become an important material for the study of cancer-related genes.With this special structure,some network embedding methods can be used to obtain the feature vectors of gene nodes from the network,and then some machine learning algorithms can be used to learn these feature vectors and build models.These models can be used to identify potential cancer-related genes.Based on above ideas,this article proposes a prediction model for two types of important cancer-related genes(oncogenesand tumor suppressor genes).The main contents are as follows.Extensive research on tumor suppressor genes helps to understand the pathogenesis of cancer and design effective treatments.However,the use of traditional experiments to identify tumor suppressor genes is costly and time-consuming,so it is necessary to design effective calculation methods to screen out potential tumor suppressor genes.So far,some calculation methods have been proposed to predict new tumor suppressor genes.However,mostmethods do not include a learning process to extract the basic attributes of validatedtumor suppressor genes,thereby reducing their efficiency.In this study,a novel computational method was proposed to identify potential tumor suppressor genes.To this end,we downloaded validated tumor suppressor genes from the TSGene database(version 1.0).These tumor suppressor genes,together with other genes,are represented by features extracted from protein interaction networkvia the powerful network embedding method,Mashup.Then,severalrandom forest modelswereconstructed and used to predict the potential tumor suppressor genes.According to validatedtumor suppressor genes in the TSGene database(version 2.0),our method has better performance than somepreviously proposed methods.Oncogene is a special gene that can promote the occurrence of tumors.The study of oncogenes helps to understand the causes of cancer.Early biological experiment techniques are very popular in detecting cancer-causing genes.However,in recent years,the shortcomings of this method have become more and more obvious,such as high cost and time-consuming.Considering the limitations of some previous calculation methods,this research proposesa novel calculation method for identifying oncogenes.It constructs a protein interaction network and adoptsthe network embedding method Mashup to extract features from such network.The classic machine learning algorithm,random forest,is applied to these features forcapturing the essential information of oncogenes,therebybuilding the prediction model.According to the measurement results producedby the prediction model,all genes are ranked.Using classic evaluation indicators to evaluate the model,the method in this article has better performance than some other methods.The top-ranked unmarkedgenes are completely different from the potential oncogenes discovered by previous methods,which can be confirmed that they are new oncogenes with high likelihood.

Keywords/Search Tags:

protein-protein interaction, cancer-related genes, tumor suppressor genes, oncogenes, network embedding methods, random forest

PDF Full Text Request

Related items

1	Reserch On Disease Genes Prediction Algorithm In Protein-Protein Interaction Network
2	Screening And Validation Of Possible Relative Oncogenes/Tumor Suppressor Genes In Nasopharyngeal Carcinoma Cell Lines
3	The Analysis Of DNA Methylation And Histone Modification Of Tumor Suppressor Genes And Proto-Oncogenes
4	The KLF6 Tumor Suppressor Post-Translational Modification of by GSK3-beta, Cancer-Related Target Genes, and Novel Protein Interactions
5	Prioritization Of Candidate Disease Genes Based On Topological Similarity And Optimized PPI Network
6	Prioritization Of Candidate Disease Genes By Combining Topological Similarity With Semantic Similarity
7	Protein-protein Interaction Network Analysis Of Genes Related To Obesity
8	Meta-analysis Of MRNA Expression Profiles To Identify Hub Genes And Pathways In Differentiated Ovarian Cancer
9	Bioinformatics Analysis Of Prognosis Related Genes And Expression Of TOP2A In Non-small Cell Lung Cancer
10	Meta-Analysis Of MRNA Expression Profiles To Identify Key Genes And Pathways In Ovarian Cancer