| The determination of disease genes is an important step in genomics from scientific research to application.At present,there have been a large number of related researches,among which the screening of potential disease genes is the current research focus in the field of bioinformatics.The process of determining disease genes takes a long period of time,and it is necessary to screen out potential disease genes and conduct biological experiments to determine disease genes.Traditional screening methods,such as location cloning,genome-wide association analysis,and linkage analysis,have the problem of a large number of candidate genes and few real disease genes.In the screening of potential disease genes,it is the core problem to find as many disease genes as possible accurately and without omission.In this paper,the application of two models of gene ontology and biological pathways in the screening of potential disease genes is studied.The two data are used to calculate the gene function similarity,and the machine function classification model is used to screen the potential disease genes.Compared with traditional methods,the number of potential disease genes selected is less,which can shorten the time period of disease gene determination and reduce the cost of disease gene determination.The main work includes:(1)An improved method for identifying disease genes based on gene ontology is proposed.Existing methods believe that "disease genes will gather on the biological process branch of the gene ontology." This paper believes that "disease genes will be aggregated on all branches of the gene ontology" and proposes a full branch aggregation method(Full Branch Aggreation,FBA).When calculating the term similarity of gene ontology and the similarity of gene function,all branches of gene ontology are used.An experiment was conducted on the autism spectrum disorder disease gene data set,and four different gene ontology term similarity algorithms were tested: Resnik,Rel,Wang,Netsim.The experimental results show that the average recognition accuracy of the improved method has increased from 72% to 78%.The highest classification accuracy increased from 79.3% to 91.4%.(2)A disease gene recognition method based on biological pathways is proposed.The typical Pathcard biological pathway database is used to calculate the functional similarity between genes through the correlation between genes and biological pathways.In autistic spectrum disorder diseases Experiments were conducted genetically.The experimental results show that the recognition accuracy(Accuracy)has reached 95.98%,the precision(Precision)has reached 93.94%,and the recall rate(Recall)is above 98%,indicating that the biological pathway-based method can effectively identify disease genes.In addition,it is compared with the method based on gene ontology,93.94% in precision is slightly lower than 97.96% based on gene ontology,and 98.30% in recall rate is higher than based on gene ontology 83.84% of the results indicate that the biological pathway-based method will misjudge some non-disease genes,but less omit disease genes.This paper mainly studies two types of data,gene ontology and Pathcard,which are widely used in the field of bioinformatics,improves the existing methods based on gene ontology,discusses the feasibility of Pathcard in calculating the similarity of gene function,and applies it to diseases Gene classification.Limited to the classification model used in the research,only one disease gene of autism spectrum disorder is used,and the next step is to modify the model to apply to other disease genes. |