Font Size: a A A

Research Of Contig-Based Two-Sided Scaffold Filling

Posted on:2024-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:S H LiFull Text:PDF
GTID:2530307076974819Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of modern biotechnology,more and more attention has been paid to gene defect detection and gene drug research,and the demand for genome data is constantly rising.In theory,complete genome data should be obtained through genome wide re sequencing.However,due to some limitations and limitations of biological sequencing technology,there may be differences in the coverage depth and coverage rate of genome wide re sequencing,that is,some regions may have low sequencing depth or may not be covered,which may easily lead to the loss of genetic data.The missing of genomic data has a certain impact on the analysis of genomic data.Therefore,genome assembly for incomplete genomic data has received widespread attention.Genome assembly refers to the process of assembling a large number of short reads generated by high throughput DNA sequencing into complete genomic sequences according to their relative positions,sequence overlaps,and other characteristics.The goal of genome assembly is to splice these fragments together to match the actual sequence on the genome,while ensuring that the length of the assembled genome is as close to the actual length as possible.Genome assembly is a crucial step for genomic data analysis.Currently,the sequencing sequences obtained by the third generation sequencing technology are generally relatively short,and researchers often use a variety of different sequencing techniques,assembly algorithms,and quality control strategies to achieve better assembly results.The genome scaffold filling problem is an important part of genome assembly,and it is an emerging combinatorial optimization problem in computational biology,which plays an important role in biology,genetic engineering and other fields.With the development of gene sequencing technology,the cost of sequencing has continued to decrease,and the efficiency of sequencing has continued to increase.Genomic data has shifted from a single gene to a form of scaffold contigs.With more and more computer software that can obtain scaffold contigs,research on genome scaffold contigs is in the ascendant.The main work of this thesis is summarized as follows:(1)For a class of examples of two-sided genome scaffold filling problems based on scaffold contigs,a class of TSSF-max-BC instances where there is no contradictory relationship between the input two permutations is explored more deeply.Using the greedy strategy,a polynomial time can be proposed.The filling algorithm of the solution proves the correctness of the polynomial time solvable filling algorithm,and analyzes the time complexity of the algorithm.Through programming,visualization of genome scaffold filling was achieved,verifying the correctness and effectiveness of the algorithm.(2)For a general example of the two-sided genome scaffold filling problem based on scaffold contigs,firstly,the classification of Type-ii type missing genes is further divided.Secondly,rules for establishing edges between nodes in weighted general graphs are set,and the weights of edges are defined and formalized.A construction method for weighted general graphs is proposed,providing new ideas and methods for solving this type of problem.(3)For a class of examples of two-sided genome scaffold filling problems with repeated genes based on scaffold contigs,under the premise that the maximum number of occurrences of each gene is 2 and based on the proportion of type-I type missing strings and type-iii type missing strings occurrences(1:1),a class of R-TSSF-max-BC example,classify the missing strings,construct the auxiliary graph by analyzing the relationship between the related strings,study the relationship between the auxiliary graph and the optimal solution,use the greedy strategy,the maximum matching algorithm and the backtracking algorithm,proposed a 2-approximation algorithm and proved the correctness of the algorithm,designed and implemented a visualization program through the python language,and further confirmed the feasibility of the algorithm.
Keywords/Search Tags:Contig, scaffold Filling, duplicate genes, polynomial time algorithms, approximation algorithms
PDF Full Text Request
Related items