Font Size: a A A

Using Cellular Automata To Simulating Domain Evolution In Proteins

Posted on:2021-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:G F XueFull Text:PDF
GTID:2370330602469784Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the completion of the Human Genome Project,the focus of bioinformatics research has quietly shifted from the accumulation of biological data to the processing of biological data and information extraction.However,the study of protein evolution mechanism is still limited to simple biological experiments and statistical analysis.In this paper,cellular automata is introduced to design a set of evolution rules for the fusion,fission,insertion and deletion process of protein domain architecture,so as to simulate the evolution of proteins in their natural state.At the same time,this paper uses cellular automata and image feature processing methods to convert the genome data of the virus into feature images and analyze them.The specific research contents and results are as follows:1)Construct a protein domain coding model.The domain structure is used to replace the traditional protein amino acid sequence,which simplifies the expression of protein information,and can connect the connections of various conserved regions.Moreover,this coding method can fuse the structural information of the protein,transform the changes of various amino acids of the protein during the evolution into the domain changes of the region where it is located,and provide ideas for sequence homology comparison and molecular evolution.2)Establish the rules of protein domain evolution.In order to simulate the evolution process of protein in the natural state,this paper designed "inheritance rules","back-to-forward rules","forward-to-back rules",and "maintain ? rules" as the cellular automaton update states in the mapping function,At the same time,by analyzing the domain architectures of each protein in the data set,and statistic the position information between the two domains,two probability matrices are obtained.Through the roulette algorithm,the position information of each domain is related to the evolution rules,to simulate the diversity and randomness of the environment in the natural state.Using the human Rho GEF protein family to test the model,the results indicate that the accuracy of the model can reach 90.27%.At the same time,the analysis of the simulated domain architecture shows that the model realizes the process of domain fusion,fission,insertion and deletion.Moreover,the frequency distribution of each domain neighbor in accordance with a power law and consistent with the supra-domains concept.This model has application prospects for studying the evolution direction of protein domain architecture.3)Construct a gene sequence visualization model.In this paper,we encode bases according to their structural categories,encode one-dimensional gene sequences into one-dimensional binary sequences,and on this basis,use cellular automata to construct a visual model of viral genes,Meanwhile,the image was processed by canny edge detection algorithm.The result shows that the feature image of coronavirus is mainly "/" shaped diagonal stripes to the left.The feature images of SARS related viruses(2019-n Co V and SARS-Co V)all contain six V-shaped cross regions;rather than SARS-related virus,the feature image of MERS has only one V-shaped cross regions;the feature image of the non-coronavirus Ebola is mainly "\" shaped diagonal stripes to the right,although it has three V-shaped cross regions.Meanwhile,by calculating the structural similarity between feature images,it was found that the structural similarity of the characteristic image between the bat coronavirus Ra TG13 and 2019-n Co V reached 77.12%,and the structural similarity of the feature image between the pangolin coronavirus and 2019-n Co V 73.36%.This model can solve the problems of gene sequence data conversion,processing,display,analysis,etc.,and provides a new way for researchers to analyze the characteristics and functions of gene sequences.
Keywords/Search Tags:Cellular automata, protein evolution, domains, gene sequence visualization, 2019-nCoV
PDF Full Text Request
Related items