| With the emergence and continuous development of high-throughput sequencing technology,biological sequence analysis has become popular nowadays.More and more biological research involves changes at the genetic level of species.At the same time,the bioinformatics analysis process of genomics,transcriptomics,and proteomics is also maturing,and correlation analysis has almost become a critical indicator of the depth of biological research.Among them,changes in the temporal and spatial expression of genes and gene mutations are often the main analysis objects of bioinformatics research.When a cell or individual organism adapts to a specific environmental change,it will drive its genes to undergo specific or non-specific mutations.These genotype changes are often associated with changes in the phenotype of the organism.For example,researchers usually adopt long-term sub-generational induction methods to adapt bacteria in laboratory experimental evolution to the designed culture conditions,observe the changes in the biological phenotype of the samples,and determine genetic mutations in the evolution process through gene sequencing to study the association between specific genotypes and phenotypes.As we all know,among the mutant genes,missense variants are the most concerning part.A missense mutation is the change of the encoded amino acid due to base substitution.The correlation between the change of amino acid and the overall function of the protein has always been the focus and difficulty of protein function research.However,experimental verification of the function of the mutant protein requires a higher cost.In addition,a single variant sample usually contains hundreds or thousands of missense mutations,and comprehensive experimental verification is time-consuming,laborious and lacks practical operational feasibility.In recent years,artificial intelligence-related technologies have developed rapidly,and their applications in the field of biology have become increasingly widespread.Using computational methods to predict or analyze missense variants has also become one of the goals of much protein-related bioinformatics research.In addition,the emergence of deep sequencing scanning technology has provided a large amount of reliable experimental data for machine learning methods.These advances have also made it more feasible to predict and analyze missense variant effects with machine learning methods that rely on the sample size of the related data.This paper uses a deep generative network method to fit the data probability distribution of functional proteins of the same protein family.The influence of missense variants is predicted by the generation probability of variants.To improve the fitting ability of the model,this paper introduces the word embedding network and attention mechanism,trying to improve the prediction results further.In addition,this paper applies the established missense variant prediction method to two newly sequenced wall-deficient E.coli L-form strains cultured in our laboratory to study the gene mutations through the removal of the bacterial cell wall.In the process of analyzing the genomics of L-form cells and wild-type bacteria,this article added missense variant prediction analysis,screening and comparing a series of missense variant genes.The predicted results enable the comparative analysis to be a more detailed and credible one,which provide fundamental references for further downstream studies of gene enrichment,pathway analysis and functional analysis for mutated proteins. |