| Disease gene prediction has been a core issue in bioinformatics.And the detection of disease gene is the basis for understanding the pathogenesis of the disease and assisting in the clinical judgment.With the development of high throughput experiment,large biological information gets rapid growth.There are a growing number of computational approaches to mine the relationship between diseases and gens.In recent years,a lot of disease genes prediction based on functional similarity have been proposed.While in the current position of the more than 26 thousand genes,42%of the genes functions are unknown.A true candidate gene can be missed if it lacks sufficient annotations.As the main biological actor,protein expression is closer with the gene functions.So the proteomics data newly released was integrated with multiple omics data to predict gene function and disease genes.In this work,a novel method,Pemo,was proposed to integrate multiple omics with protein expression data to predict gene function.Human protein expression obtained from mass spectrometry was applied to predict genes functions.Two matrix,Pearson correlation coefficient matrix and function probability matrix of Gene Ontology(GO)term annotations,were built.The genes related with unknown genes are filtered by gene-gene interactions data and the genes which rarely interacts with other genes would be removed.The GO terms score were generated by multiplying the two matrix and the GO terms were annotated according to sorting the scores.Pemo integrates multiple omics data,including protein expression,protein sequences,RNA-Seq and gene interaction network data,into Naive Bayesian framework to predict gene functions.Compared with other omics data,the protein expression performs best on functional prediction.The integrated multiple omics data makes the accuracy higher than using only one omics dataset and after integrating protein expression data,the accuracy of prediction has a clearly improved.And the Pemo also was compared with other function prediction methods and provided best recovery of annotation terms.Then the relationship between diseases and genes was predicted based on gene functional similarity and protein expression.We calculate the relevance between disease and Gene Ontology(GO)terms and mine the relationship of GO terms in GO directed acyclic graph.While calculating the similarity between GO terms,the distance and the intersection of their parent node sets was considered to evaluate the relationship.The disease genes were successfully predicted and get a lower false positive rate.In this work,the prediction results of four diseases,stomach cancer,lung cancer,breast cancer and congenita heart disease were showed.After comparing with other disease genes prediction methods,the result of the method is better than others.Some potential disease genes arealso predicted and need be further experimental validation. |