Font Size: a A A

Gene Ontology Term Similarity Calculation Reserarch Method

Posted on:2020-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z TangFull Text:PDF
GTID:2370330578968182Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Gene Ontology(GO) is an ontology based on bioinformatics resources that uses ontology to represent biological knowledge and to describe information about the function of genes and gene products.GO consists of three separate categories: molecular function,biological process and cellular composition.At present,our main research content is the calculation of the term similarity of gene ontology and the research on the automatic extension algorithm of gene ontology terms.There are still many shortcomings in the existing calculation algorithms for term similarity.The existing information is not fully utilized.In recent years,some people have introduced the gene function network into the calculation of term similarity,but only the direct consideration of the gene function network.Connected genes,neglected indirect relationships in gene networks,do not make full use of existing information.Most of the gene ontology we construct now is artificially constructed,which greatly increases the workload.With the advancement of biotechnology and the continuous increase of data volume,what we need to do now is to develop a kind of gene ontology that can accurately and automatically expand.The terminology algorithm reduces the workload.The main research content of this paper is to improve the existing similarity algorithm to improve the accuracy of similarity calculation and the term extension algorithm research:(1)This paper proposes a Random Walk with Restart-based similarity measure(RWRSM) based on the fusion of Gaussian kernel function.The algorithm proposed in this paper can capture the global structure information of the functional network.Based on several experiments on the yeast group of the EC(Enzyme Commission) number,the results show that the algorithm has the highest LFC(Logged Fold Change) score among all 104 groups of ECs,accounting for 84.6% of all groups.Tests show that the proposed algorithm can improve the accuracy and stability of gene functional similarity in gene ontology.(2)Gene Ontology(GO) is a widely used resource for describing the properties of gene products.Due to the complex logical reasoning and the need for biological knowledge not explicitly stated in GO,the automatic maintenance of GO is still difficult.Existing research either builds the entire GO based on network data or only infers the relationship between existing GO terms.Not used to automatically add new terms to existing GOs.We propose a new algorithm,GO-Extension,to efficiently identify all connected gene pairs that are marked by the same ancestor term.GO-Extension is used to predict new GO terms from bio-network data and connect them to existing GOs.In the biological process branch experiment,the data for 2007,2009,2011,and 2013 contained 190,239,272,and 284 validation terms,respectively.The GO-Extension method predicts 183,265,289 and 282 terms.According to experimental results,GO-Extension can automatically extend new GO terms based on biological networks.
Keywords/Search Tags:Gene ontology, similarity of term, gene function network, extension of term
PDF Full Text Request
Related items