| Biological data,especially biological macromolecule data,is one of the important ways to explore the origin of life and study the law of life.The complexity of living system makes biological data have the characteristics of diversity and high dimension.Therefore,structural modeling of living system is an important research method.Modeling is used to describe the complex regulatory mechanism in the system,and further simplified structural modeling can reduce the complexity of the system,which is beneficial to explore the regulatory mechanism of biological macromolecules in living systems at low cost.Granular computing is a concept and computing paradigm to deal with problems through information granulation,which can analyze and solve the basic principles and mechanisms of complex problems.In this paper,the idea of granular computing is mainly used to deal with complex living systems.By means of clustering,the system is granulated into different granularities,and the complex system is analyzed from different levels and perspectives.The main research contents of this paper are as follows:In Chapter 2,the basic concept of granular computing is introduced in detail,the granularity space is constructed by isosceles normalized distance and the concept of equivalence class,and the granularity space is further combined with clustering.The calculation formulas of the intra-class difference and the inter-class difference in the granular space are given.According to the property that as the number of classes decreases,the intra-class difference increases and the inter-class difference decreases,but the sum of the two remains unchanged,and the optimal clustering index FHEI is introduced.Finally,the concept of two cluster fusion techniques based on isosceles normalized distance is given.In Chapter 3,based on the structural protein sequence data of the Beta coronavirus,the idea of granular computing is used to trace the origin of SARS-Co V-2.Firstly,combined with the 595 Beta coronavirus M structural protein sequence data on NCBI,the feature extraction of viral protein sequences is carried out,and a measure of similarity between sequences is constructed.Furthermore,Algorithm A is designed by FHEI,and the metric data is used as input to perform hierarchical clustering of Beta coronaviruses,and the optimal clustering structure is obtained.Finally,using the recent central principle and the optimal clustering structure,the coronavirus evolutionary tree is constructed and the virus origin tracing research is carried out.The experimental results show that SARS-Co V-2 is closely related to bat coronavirus(Ra TG13)and pangolin coronavirus.This result is consistent with the existing literature results,indicating that the method is effective.In Chapter 4,the technology of fusion clustering structure is introduced.Having first introduced the concept of complete and incomplete granularity spaces,and having discovered various properties of fusion clustering structures.Using these properties,it is proved that the optimal fusion clustering structure must exist in the granular space generated by the fusion clustering structure,which enriches the theoretical framework of granular computing.Then the application research of clustering fusion technology is carried out with the help of the idea of granular computing,and the AB method to obtain the representative elements of the multi-attribute data system is proposed.Firstly,the Algorithm A in Chapter 3 is improved,and Algorithm A2 is proposed to obtain the optimal clustering structure of each attribute of the data system and use it as the base cluster.Secondly,Algorithm B is designed using the concept of information entropy.The algorithm takes the base cluster as input to obtain the optimal fusion clustering structure.Then,using the nearest center principle,the representative elements of each class in the optimal fusion clustering structure are obtained.Finally,the differentially expressed genes are screened out by using the gene expression data of cervical cancer,and the four interaction relationship scores between the differentially expressed genes are used as a multi-attribute data system,and input into the AB method.The five representative elements are the predictors of cervical cancer: RTTN,SAMD10,ZNF207,WAC,METTL14.The results show that the identified predictors have 98.82% classification accuracy.This paper also conducts a comparative study between the AB method and other classical methods on six independent gene expression datasets,and the results show that the AB method is superior in the classification of patient samples,especially with a high classification accuracy under the premise of a small number of predictors. |