For the past few years,with the rapid development of biotechnology,the study of biology has gradually shifted from accumulating data to interpreting and analyzing data.So Bioinformatics has emerged and rapidly become the new frontier for biological studies.Bioinformatics,as the name suggests,it is the study of biological data information.Bioinformatics is a developing interdiscipline,which is based on mathematical calculation methods,computer program code and the knowledge of other subjects,and used to store,retrieve and analyze biological information.The research area of bioinformatics is abundant.This thesis mainly studied in the aspect of sequence analysis.The main contents are as follows:The 2-D graphical representation of DNA sequences proposed by Nandy has been used in many bioinformatics problems.Regrettably,this graphical representation is degeneracy.GUO improves it by rotating a small angle the four directions in a two-dimensional space later,this method greatly reduces the degeneracy of the graphical representation.However,the phenomenon of degeneration has not been completely avoided.Inspired by the improvement ideas of Guo and others,by associating four bases(A,C,G and T)with four direction vectors in three-dimensional space,a 3-D generalized Nandy graphical representation of DNA sequence is proposed.It is proved that there is no any circle in the graph,and this guarantees the graph has nondegeneracy.We numerically characterize a DNA sequence by means of L/L matrix's ALE-index and graph radius.By absorbing and drawing the concept of gravity field in physics,a potential function among data objects in vector form is constructed,which value can reflect the relationship between sequences.K-nearest neighbor algorithm serves as a classifier.The utility of the proposed approach is illustrated byrecognition of 208 RIG-I genes. |