Lossless Reference DNA Data Compression Method Based On ICBDS Optimization

Posted on:2020-08-14

Degree:Master

Type:Thesis

Country:China

Candidate:S W Du

Full Text:PDF

GTID:2370330620951114

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

DNA is a kind of polymer that stores the genetic information of living things.Nowadays,research on DNA has become a hot issue.With the continuous development of high-throughput sequencing technology,the cost of sequencing is getting lower and lower,and the sequencing cycle is getting shorter and shorter,that lead to DNA data scale grows exponentially.The storage of massive DNA data resources with a small amount of space in limited resources has become a new challenge for biologists and computer experts.In recent years,DNA data compression methods have been proposed to increase the compression ratio,and some to reduce the compression time.The compression method proposed by Nour and Amr has a great advantage in compression time compared to the previous method,but is limited to bacterial DNA data.In this paper,RU(recently used)transform and MG(merged)transform are proposed to improve the method and two improved step-by-step compression methods are given.Each method is divided into two compressions.The main work of this paper is as follows:(1)Propose RU transformation for DNA data compression.The first compression performs a series of operations on the DNA data,first converting the DNA data into a binary file containing only 0 and 1 and a base sequence file having different adjacent characters,and then passing the base file through the RU.The transformation is transformed into a small integer sequence file,and then converted into a binary file by using the Hafman coding idea,and finally all the binary files are converted into ordinary character files;In the second compression,the general text compression algorithm LZ77 algorithm is used to uniformly compress all the obtained character files.(2)Propose MG transformation for DNA data compression.The first compression performs a series of operations on the DNA data.First,the DNA data is converted into a binary file containing only 0 and 1 and a base sequence file containing only three characters.Next,the base sequence file is converted into a binary file and a base sequence file whose length is halved by MG transformation,and the resulting base sequence file is converted into a binary file by the Hafman coding idea and finally all the binary files are converted into ordinary character files;In the second compression,the general text compression algorithmLZ77 algorithm is used to uniformly compress all the obtained character files.For the two compression methods in this paper,the test data from the DNA data compression algorithm commonly used in GenBank database is selected for experimental demonstration.The experimental results show that compared with the method of Nour and Amr: for bacterial DNA data,the compression time and the decompression time of the DNA data compression method based on RU transformation are saved by more than 70%,but the compression rate is reduced by 1.5% on average,the MG-based DNA data compression method saves both compression time and decompression time by more than 50%,but the compression rate is reduced by 0.5% on average;for non-bacterial,the two methods improve the compression rate,while the compression time and decompression time are saved by more than 20%.

Keywords/Search Tags:

DNA Data, RU, MG, LZ77 Algorithm, Compression, Decompression

PDF Full Text Request

Related items

1	Research And Implementation Of Data Compression Method For Submarine Acoustic Detection Based On LZW
2	Research On High Performance Biological Data Compression Algorithm Based On Heterogeneous Computing Platform
3	The Research And Applications Of The Model Of WebGIS In Logistics Field
4	Research On The Compression Algorithm Of Social Network Graph
5	A Study Of The Lossless Compression Algorithm Of Vector Map Data
6	Research On Compression Algorithm For Vector Data In GIS
7	Oil Seismic Monitoring Data Compression Method Study
8	High-throughput Genome Resequencing Data Compression Algorithm Based On Self-index Structure
9	Research On Point Cloud Compression Algorithm Based On Geometric Feature Constraint
10	The Research Of Reference-based Compression Specified For Sequence Data