| In the rapid development of biotechnology in the 21st century,the genetic variation research has become one of the most popular researchs after the completion of the Human Genome sequencing project.Single Nucleotide Polymorphisms is one of the most prominent common forms in all kinds of genetic variation,the study of SNP can explore the evolution of biological processes,to solve many problems about the diagnosis and treatment of genetic diseases and the mechanism of susceptibility,and play an important role in drug development,etc.In some related studies,it was found that the genetic information carried by the haplotype was more abundant than the single SNP phase.However,the use of biological methods for individual haplotype sequencing will be affected by the limitation of experimental conditions and costly constraints,the calculation problem of the individual haplotype reconstruction arises at the historic moment.It means to complete the reconstruction of the haplotypes by computing technology based on the individual DNA sequencing fragments.According to the individual DNA sequencing fragments,the use of computing technology to complete the reconstruction of the haplotype.This paper studies the diploid individual haplotypes reconstruction problem.HapCompass is an effective heuristic algorithm for solving the minimum weighted edge removal(MWER)model.The HapCompass algorithm eliminated conflicting cycle basis by deleting the edge with the minimum absolute value of weight.When there were several edges whose absolute values of weights were equal to the minimum one,HapCompass chose the one deleted randomly,which led to produce uncertain solutions and decrease the reconstruction effect.Aiming at this problem,algorithm IHapCompass was proposed.It improved the rules of deleting edges to limit the random value effectively.IHapCompass took advantage of the relativity between the difference of fragment number of 0011and that of 0110and the total fragment number to ascertain the deleted edge.It made use of the probability of 0/1 in haplotype to determine the SNP value for an isolated vertex,which ascertained the value of an isolated vertex.Using the haplotype data in the CEPH samples published by HapMap,the fragment data were generated by two kinds of simulators CELSIM and MetaSim.The reconstruction rate and running time were compared and analyzed among algorithms IHapCompass,HapCompass,DGS and Fast Hare with different parameters settings,such as fragment coverage,error rate,single fragment length and haplotype length.Under different parameter settings,the IHapCompass algorithm can obtain highest reconstruction rate and has high efficiency,which were proved by a number of experiments.The minimum error correction model MEC attempts to correct the least sequencing error to determine the unique phasing of haplotype.In this paper,an effective heuristic algorithm HIEF is proposed for this model.By analyzing the data characteristics of real sequencing data NA12878,it is found that the rule exists in the real DNA sequencing fragment data,that is,the more fragments of the number of non-empty phase,the more reliable the information carried.Based on this idea,the algorithm HIEF first selects the sequencing fragment with the most number of non-empty phases as a benchmark.Based on the MEC model,the remaining fragments are divided into two incompatible sets.In the process of division,the sequencing fragments with the most number of non-empty phases are continually expanded,eventually expanded to complete reconstruction of diploid haplotype.Using CELSIM generate simulated data and NA12878 real sequencing fragments data test and analysis the performance of algorithm,the experimental results show that the algorithm can obtain better performance than Exact,Fast Hare and DGS in several evaluation indexes.To sum up,this paper studies the diploid haplotype reconstruction algorithm problem.The several evaluation indexes by a large number of experiments show that the IHapCompass and HIEF algorithm proposed in this paper can gain higher precision of the reconstruction results.They can solve experiment cost and time cost of haplotype reconstruction problem effectively,and provide reference and help for biological researchers. |