Font Size: a A A

Identification Method For Risky SNP Of Osteoporosis

Posted on:2018-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:H C GuFull Text:PDF
GTID:2370330518982357Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The rise and rapid development of bioinformatics promotes the innovation of biology technology,and the huge amounts of biological data has been produced.The development direction of bioinformatics is analyzing the biological data and unearthing the valuable information.In the field of human life science,the primary task researchers are faced with is how to use bioinformatics technology to find the pathogenic factor of complex disease,which provides the theoretical basis for curing the complex diseases.Osteoporosis is one of the common complex genetic diseases.In past 20 years,much progress has been made on the genetic analysis of osteoporosis,a lot of genes and SNPs associated with osteoporosis have been found through GWAS method.We analyzed the osteoporosis GWAS associated SNPs and genes by bioinformatics tools,and we found that there were interactions between the genes and SNPs which related to the mesenchymal stem cell differentiation and metabolism of osteoporosis.According the analysis result of osteoporosis GWAS associated SNPs and genes,we made an assumption that the SNPs that are similar to the osteoporosis GWAS associated SNPs are possible risky SNPs associated with osteoporosis.The framework of the identification method includes two steps:Firstly,we identified whether the genes associated with the suspected risky SNPs were associated with osteoporosis.We collected the osteoporosis GWAS associated genes as training set.We also constructed the PPI network of osteoporosis GWAS associated genes and the genes associated with the suspected risky SNPs.We ranked the genes associated with the suspected risky SNPs through the random walk based on Markov Chain algorithm on the PPI network.Then,the result was acquired by setting up a threshold,and the genes associated with suspected risky SNPs are probably the osteoporosis associated genes if their scores are greater than the threshold.Then,we classified the SNPs with positive results through ID3 decision tree algorithm with Pesimistic-Error Pruning.We selected the suspected risky SNPs whose associated genes are identified associated with osteoporosis and used the osteoporosis GWAS associated SNPs with their loci features as the training set.We classified the risky SNPs with their loci features by a classification tree that constructed by ID3 decision tree algorithm with PEP.The osteoporosis risky SNPs would be identified by the correct classification.Finally,we used the known osteoporosis GWAS associated SNPs and type 2 diabetes GWAS associated SNPs as the data set.We verified the two steps by 10 fold cross verification method.Then,we evaluated the identification method by ROC curves.And the results showed that this method is feasible.The identification method for risky SNPs of osteoporosis has achieved the process of identifying osteoporosis risky SNPs automatically by algorithm.It has provided an efficient way for the further study of the osteoporosis SNPs' susceptibility.
Keywords/Search Tags:osteoporosis, SNP, random walk, ID3 decision tree, Pesimistic-Error Pruning
PDF Full Text Request
Related items