Font Size: a A A

Research On Sequence Analysis And Similarity Study Of SARS-CoV-2 Gene Sequence Based On Variant Measurement And Information Entropy

Posted on:2022-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:M QiaoFull Text:PDF
GTID:2480306335956759Subject:Fundamental Medicine
Abstract/Summary:PDF Full Text Request
With the successful implementation of human genome sequencing 20 years ago,modern bioinformatics research has developed rapidly.Using the data processing ability of modern computers to build a high-throughput gene expression analysis platform to analyze the structure and functional characteristics of biological sequences,more and more problems of biological information have been broken through.Gene sequence analysis and sequence similarity alignment have always been the research interest of researchers.The similarity analysis of different gene sequences can better achieve the specific needs of species classification,gene identification and gene bank retrieval,species evolutionary relationship,structure and function correspondence,etc.Due to the different lengths of various gene sequences,a large amount of data and high complexity,fast,accurate and efficient acquisition of sequence characteristics had become a significant challenge for frontier bioinformatics research.SARS-CoV-2 has become a pandemic disease threatening human life and health safety worldwide since its emergence in January 2020.By analyzing virus sequences and studying the similarity between functional blocks of sequences,this paper provides scientific assistance for the tracing of novel coronavirus and the development of the vaccine.The relevant core topics are very important for solving frontier and future coronavirus problems.In this paper,a novel coronavirus sequence and various virus sequences were used as the research objects.A method of gene sequence analysis and similarity study based on variable value measurement and information entropy was proposed.This method will SARS-CoV-2 and many virus sequences as the research data.To change the sequence of bases,we obtain value measure combinations of multiple features,and then the sequence of the overall characteristics of the frequency and the general characteristics of the probability,after dealing with the characteristics of the probability difference characteristic probability matrix,projection diagram,short of the characteristics of the probability analysis of the overall features of the sequence;The local feature probability and local feature probability difference under different segments of each sequence are calculated.Combined with the feature probability difference matrix,the measure that is consistent with the global feature probability difference is found out.Then,the local feature of the sequence is analyzed by projecting it into a two-dimensional variable value graph.Finally,the information entropy was calculated according to the feature probability with the most significant contribution,and the distance between sequences was calculated by the information entropy.The sequence distance matrix was constructed to generate the phylogenetic tree,and the similarity degree between gene sequences was accurately expressed.Batch experiments show that the method is efficient for sequence measurement and similarity analysis.Compared with the existing methods,the VMIE model has lower running time complexity and less dependence and provides a highly efficient method for the similarity analysis of unaligned sequences,which provides new technical support for the study of the source of novel coronavirus sequences,the development of vaccines and other measurement and analysis applications.
Keywords/Search Tags:SARS-CoV-2, Variant measurement, Information entropy, Sequence similarity, Sequence visualization
PDF Full Text Request
Related items