Font Size: a A A

Research On Fast Migration Algorithm Between Reference Gene Compression Libraries

Posted on:2018-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ZhangFull Text:PDF
GTID:2350330536956285Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the reduction of gene sequencing costs and the need for emerging technologies such as precision medical and deep learning in genome,it is an era of gene data outbreaks.Facing such a large amount of genetic data,how to store and transmit these data has become a hotspot in the current research.The compression algorithm based on reference genome is widely used in the gene pool with its high compression rate.At the same time,this kind of compression algorithm relies on the reference gene data,which also seriously restricts the sharing,merging and transporting of compressed data generated by this kind of compression algorithm.In this paper,we focus on the problem that different compression gene banks can not be shared directly due to the different reference genes,and propose a set of fast conversion of reference data based on the compression data of reference gene compression,without decompression and compression as traditional methods.The main contributions include:(1)A variety of gene compression algorithms were classified,and their characteristics for different gene compression algorithms were discussed.And a number of the latest referential genome compression algorithm were analysised in details and systematic experimental compared.(2)A fast reference conversion algorithm is provided for compressed gene data sets based on different reference genome,which was generated by the same compression algorithm.The fast algorithm mainly uses the similarity between the reference sequences to carry out the rapid migration between the reference sequences.Experimental results show that the migration time is much lower than the original method,which involves decompression and compression.It also setup the basis of the futher studies.(3)When concerning the different reference sequences using by different compression algorithms,we choose three types of compression algorithms to be considered and make use of the characteristics of these compression algorithms to improve the compression rate of the compressed genes after migration.Two kinds of migration algorithms are designed to support three kinds of compression;each is mutual migration between two algorithms.And applying a large number of data sets to verify the efficiency of our algorithms.(4)Finally,for the Loongson platform,we designed a set of genetic compression,migration and decompression of genetic data management framework,TReC,and profiling its performance.At last,we use multi-process technology to improve the speed of compression and transformation by overlapping the computing and disk IO based on the profiling output.Based on the reference genome compression algorithms,in this paper,we proposed two effective migration algorithms,which have great advantages in reducing the migration time.These techniques can effectively alleviate the mutual migration based on different reference genome compression gene bank.And this study is also valuable for the follow-up research to provide experience.
Keywords/Search Tags:Reference-based compression, DNA sequence compression, Reference sequence conversion, FASTA, Loongson
PDF Full Text Request
Related items