Font Size: a A A

Barcode Design For DNA Library

Posted on:2016-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:H Q QuFull Text:PDF
GTID:2180330461998094Subject:Breeding
Abstract/Summary:PDF Full Text Request
This study was designed to design barcode for DNA library construction for high-throughput sequencing, in order to get more genotypic data in one sequence, reduce the sequencing cost provide genotypic data for the subsequent whole-genome association and genome breeding. The appearance of the gene sequencing promoted the development of biological sciences greatly. The DNA sequencing invented by the British biochemist Sanger obtained countless scientific research achievement in the past 30 years.The human genome project promotes the emergence and development of second-generation sequencing, due to its high automation,, high-throughput and low cost made an enormous impact on biomedical field in few years.Barcode refers to a link after the sequencing of joint is used to distinguish base sequence source of the genotypic data in the sequencing analysis. Mixed samples after adding barcode when the DNA library construction in the sequencing process, could be obtained dozens or even hundreds of genotypic data, could significantly reduce the cost of sequencing. Published barcode contained 96 in Cornell University and 12 in Illumina Company and 96 in Huazhong Agricultural University. Among them, 96 in Cornell for Apek I enzyme, 12 in Illumina company only has 6 and96 in Huazhong Agricultural University for Mse I enzymes and enzyme Sac I. Based on existing research, different species required different restriction endonuclease when DNA library construction. Thus the published barcode cannot fully meet the needs of the current scientific research, and need of a barcode design program.The open source statistical analysis language R is widely used in Internet, pharmaceutical,environmental protection, gene sequencing and other industries, could solve the problem of data storage and data query delay in the sequencing process. This research used R language for barcode design :( 1) Setting the length of the barcode for 3-11 base pairs, the ATGC respectively replaced by the number 1234, apply the permutation and combination statements of the R, Generate the full digital permutation and combination.( 2) Putting the specific sequence encoding conversion into digital form, apply the Delete statements in R, delete the specific sequence. Delete the start codon and special sequences, such as stop codon.( 3) Deleting repetitive sequence, delete three consecutive repeated base sequences to ensure the balance and complexity of the sequence.( 4) Designing R statement according to the particular restriction enzyme’s enzyme site, delete the corresponding combination.( 5) Numbers and letters to be replaced. Put the numbers after the above steps replaced to the base letters. Test finished 3-11 barcode design, including three base pairs initial arrangement is 64, four base pairs initialarrangement is 256, five base pairs initial arrangement is 1024, six base pairs initial arrangement is 4096, seven base pairs initial arrangement is 16384, eight base pairs initial arrangement is65536, nine base pairs initial arrangement is 262144, ten base pairs initial arrangement is 1048576,eleven base pairs initial arrangement is 4194304. The remaining after the second and third screening steps, three base pairs barcode remaining is 55, four base pairs barcode remaining is 191,five base pairs barcode remaining is 660, six base pairs barcode remaining is 2294, seven base pairs barcode remaining is 7948, eight base pairs barcode remaining is 27539, nine base pairs barcode remaining is 95418, ten base pairs barcode remaining is 330607, eleven base pairs barcode remaining is 1145498. Previous experiments have been used barcode for 3-11 base pairs,through the study of statement 3-8 barcode, after deleted the specific sequence remain 55431 permutation and combination, there has been 38689 permutation and combination excluding the repetitive sequence, after deleting the enzyme loci(Apke I, for example) is 38191 permutation and combination. Provide the foundation for further design of barcode. This study has generated R script used by this method, and convenient to use.
Keywords/Search Tags:Barcode, Adaptor, R language, DNA library, Gene sequencing
PDF Full Text Request
Related items