Font Size: a A A

Genome Structural Variation Detection Method Based On Neural Network

Posted on:2016-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:F QiFull Text:PDF
GTID:2180330473462640Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of new sequencing technologies, the cost of whole human genome sequencing in declining, while the speed of sequencing in constant improvement. Followed by vast amounts of data, how to detect the structural variation more accurate is the problem to researchers.In this paper, we are from the genome structural variation detection process and mainly divides into four parts:1. In view of the uncertainty of the structural variation in the real data of the whole genome, this paper designs a process using the simulation data which joining the real structural variation sequenced by the 1000 Genomes Project into the reference genome to form the individuals, so as to make the simulation data more close to the real data. The experimental results show that the simulation data generated by the new process is more close to the real data, and is of great significance to the evaluation and further study of the structural variation detection tools.2. Through the results of indel detection by the simulation data on genome structural variation detection tools such as SVseq, Pindel, SAMtools and VarScan study found that the comprehensive performance of detection tools Pindel is better than the other three kinds of genome structural variation detection tools not only in the detection of genome structural variation types but also the sensitivity. But it has to point out is, with the improvement of the sequencing coverage,the results of indel detection by Pindel will appear false position; Under normal circumstances,the genome structural variation detection tools determine the region whether exist a genome structural variation by judging the number of discordance of read-pairs that support the same variation, the number of discordant pairs of reads known as the threshold parameter which usually need to be set artificially, that makes the detection result not accurate enough.3.Neural network ensemble algorithm is widely used in the sample classification. This paper proposed a new algorithm P-A detection strategy based on neural network aim at reducing the false positive and improving the accuracy of the structural variation detection tools. The experimental results show that the P-A detection strategy based on feature extraction can effectively reduce the false discovery rate of genome structural variation, improve the sensitivity of the genome structural variation detection tools’result.4. This paper puts forward a method of using neural network to detect the genome structural variation strategy to solve the inaccurate caused by failing to make full use of the structural variation that researches have already sequenced. The strategy can take advantage of the number of the concordant and the discordant pairs of reads automatically judge the area whether exist a structural variation. The experimental results show that the strategy based on neural network can effectively determine the existence of the deletion.
Keywords/Search Tags:Neural Network Ensemble, Genome Structural Variation, Feature Extraction
PDF Full Text Request
Related items