| The various genetic mutations that occur within human cells are the main cause of cancer.In the study of gene mutation at the area of molecular biology,scientists usually use Gene Copy Number Profile(CNP)to record the genetic heterogeneity of a gene sequence before and after the mutation process.The changes in the numerical values of each element in CNP are usually due to the occurrence of duplication or deletion of corresponding elements in the gene sequence.Pathologists believe that mastering the process of CNP changes will help model the process of gene mutations,better understand the pathology of gene mutations,and has important guiding significance for cancer prevention and related drug development.In 2016,Shamir et al.first studied the problem of converting two different CNPs based on copy and delete operations,that is,how to convert one CNP into another target CNP during the minimum number of operations,and provided an exact algorithm with linear time complexity.In 2018,based on Shamir’s research work,Letu et al.restricted the operation of gene sequences to tandem duplication and deletion,that is,given a gene sequence G and a target CNP,seeking the minimum number of tandem duplication/deletion operations,transforming G into G ’,making the CNP of G’ the same as another target CNP.This problem is called the Minimum Copy Number Generation Problem(abbreviated as MCNG).At present,research has proposed various algorithms that can achieve conversion between different CNPs,but there is a lack of relevant research on whether the MCNG problem has an optimal solution in certain specific situations for a long time.In this article,we only consider the tandem duplication operation and study the problem of the Minimum Tandem Duplication Copy Number Generation Problem(abbreviated as MTDCNG).In this paper,we study the computational complexity and algorithm of MTDCNG problem under various constraints.Firstly,the number of occurrences of each element in the input sequence G of MTDCNG is limited to a constant c,and the c-MTDCNG problem is proposed to prove that when c=4,the problem is NP-hard and cannot be approximated to within 144/143;It is proven that the problem is also NP-hard when c=3,means that all MTDCNG problems are NP-hard.Next,when c=1,a binary duplication matrix is proposed based on the target CNP.When the binary duplication matrix is unimodal and there is only one "1" in each column,a polynomial time exact algorithm for MTDCNG problem is designed.The number of tandem duplications is exactly log d,and the time complexity is O(nd·log d),where d is the maximum number in the target CNP。When there are more peaks in the binary duplication matrix,we devise an approximation algorithm,which guarantees an approximation ratio of t(1-1/log(d))and runs in O(n2d·log d)time,where t is the maximum number of peaks in-between which the lowest value is not 1,and d is the maximum value of all the peaks.When the binary duplication matrix is unimodal and the ’1’ in each row and column is continuous,a polynomial time exact algorithm for MTDCNG problem is designed.The number of tandem duplications times is between[log d]and[2 × log d],and the time complexity is O(nd·log d),where d is the maximum number in the target CNP. |