Font Size: a A A

Predicting The Risky SNP Of T2D Based On Feature Classification And Lazy Restart Random Walk

Posted on:2019-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:F L ZhangFull Text:PDF
GTID:2494305474474244Subject:Software engineering
Abstract/Summary:PDF Full Text Request
T2D is one of the most common complex genetic diseases in humans.With the continuous improvement of biological technology,the genome-wide association study(GWAS)has discovered many T2D-associated SNPs and genes.Through computational biology,the researchers found that about half of T2D-related genetic changes are caused by SNPs.These GWAS-associated SNPs cause disease susceptibility by affecting miRNA binding and protein phosphorylation.In this paper,T2D risk loci are predicted based on feature classification and lazy restart random walk method.Up to now,many methods of the prediction of SNPs phenotype have been developed in this field,most of which are based on the evolutionary information of protein sequence or its structural information.However,the accuracy,sensitivity and other indicators of these methods are not satisfactory.In view of these problems,a method of T2D risk SNP prediction,applying machine learning algorithm,is proposed in this essay.First of all,according to the known T2D,BMD,obesity SNPS associated with GWAS and their position features,SVM,logistic regression,decision tree and random forest machine learning classification algorithms to classify these SNPs are respectively used to set up a classifier by comparing and analyzing.Then,combing with the relevance of their risk SNP associated genes,T2D GWAS SNP-associated genes PPI network and T2D related the genetic PPI network,a prediction method of T2D risk loci is proposed.On the grounds of the classifications of the GWAS associated SNPs to be predicted,it is determined whether a SNP is a T2D risk locus.There are certain false-negative and false-positive data in using classifiers to predict alone.To solve this problem,a lazy restart random walk method(RWLR)based on the Markov nature is proposed in this essay.The T2D risk SNP was further determined by judging whether the associated gene of the SNP to be predicted was associated with the T2D GWAS SNP-associated gene and the T2D causative gene.We first construct the PPI network of T2D GWAS SNP-associated genes,and then select T2D disease genes by t-test,use MIC to calculate the correlation to build a PPI network of T2D gene where the ranking of the associated gene of SNPs to be predicted is scored by using the lazily restarted random walk algorithm.According to whether the final score of the SNP associated gene is greater than the threshold value and related to T2D,whether the SNP is a T2D risk locus is further determined.Finally,the cross-validation method and ROC curve are used as an indicator to verify the whole predicting method.The results show that the method is reasonable and accurate,whose effect is better than that of RWRH and RWMC algorithms.And compared with using SNP classified prediction alone,it is more reliable.On the basis of the forecast results,we make a predictions on T2D risk loci,which provides a more effective way to further research SNP’s susceptibility to T2D.
Keywords/Search Tags:T2D, Machine learning, Lazy restart model, Risky SNP prediction, Random walk
PDF Full Text Request
Related items