Font Size: a A A

Modeling The Complex Biological System And Exploring Its Network Structure Based On The Granular Computing

Posted on:2018-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2310330512959255Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
During the post genomic era, human have a deeper understanding on life operation law.It has been a trending to explore the biological big data efficiently in the field of life science,which promotes the development of bioinformatics. Bioinformatics has a broad research area including the sequences, structure and function of single molecules. With the development of life science, it prefers to explore the biological complex network based on the biomics,containing the interactions among the molecules and structural functions of different modules.To reveal the regulatory or metabolic mechanisms of the biological system, it is necessary to extract the systematic structure and reduce its complexity.Granular computing plays an important role in the filed of information processing and artificial intelligence. Based on the coarse graining idea, the complex system can be analyzed from different levels. Exploration on the biological system and network, even the modular functions attracts the attention of researchers. This article focuses on the biological systematic reduction and knowledge extraction based on the granular computing, from large sample set and high dimensional data set respectively, as follows:1. Based on the influenza virus system,a feature vector is obtained to present a virus protein sequence and then an approach is given to construct hierarchical structure of virus system by analyzing similarity among multi-protein sequences. The suitable number of classes is determined according to assessment index based on the system structure.Furthermore, on account of the nearest-to-center principle, the signature viruses could be selected to represent characteristics of whole classes. Finally, the phylogenetic tree is established using hierarchical clustering algorithom through the distance metric. The results indicate that the influenza viruses with same host, similar time span, close outbreak location and same names are more likely to belong to the same branch, which indicates the effectiveness of proposed method.2. We explore the structural clustering and construct the optimal hierarchical structure based on the granular theory. Based on the coarse graining idea, intra-class deviation and inter-class deviation are introduced to measure differences in the class and among classes on different levels of hierarchical structure(granular space), respectively. Then, hierarchical evaluation index is proposed to help select an optimal structure according to clustering criteria.Thus, an algorithm is developed to enable constructing the optimal structure and obtain the multilevel structure of the system. Notably, a classifier is designed to validate the method. The results of influenza virus system show that the signature virus can approximate the whole system.3. The intrinsic differences among IHC-defined subtypes are explored, by using high-dimension gene expressed profiling. Based on the differentially expressed genes obtained, the feature genes, containing 119 mRNAs and 20 miRNAs, are selected, by reducing the dimension of the diff-gene to achieve the highest subtyping identification. The feature gene set outperforms other known gene sets in subtyping. Furthermore, the network and pathway analysis are conducted for the selected feature genes. Some cancer-relatedpathways are enriched by the feature genes and some key molecules are densely connected with other genes in the network, which elucidates the role of feature genes in the breast cancer differentiation.4. The differences and relationships among the pair-wise subtypes are explored.Considering the differential factors of breast cancer, a decision tree is constructed according to status of IHC molecules, with ER status as the priority factor and followed by HER2 status.Then we explored the differentially expressed genes among the pair-wise subtypes across the tree, which bridges the gap between immunohistochemistry markers and gene expression profiling in breast tumor subtyping. Based on the obtained feature genes, including 30 m RNAs and 7 miRNA, the subtypes of breast tumors in other datasets are identified.Furthermore, network and pathway analysis revealed the physical interaction and relationships among the selected feature genes for exploring the intrinsic relationships among the subtypes.The innovations of this paper are listed as follows:1. Hierarchical evaluation index is presented which is proved theoretically to be globally optimized. The proposed models can be used to construct the sub-hierarchical structure and system simplification.2. With the goal of achieving best classification, the differentially expressed genes are selected to analyze the intrinsic differences among breast cancer subtypes, by reducing the dimension of relevant genes gradually. Furthermore, the biological significance of the intrinsic genes is explored.3. On consideration of breast cancer differentiation, the decision tree was constructed according to the IHC molecules. The differentially expressed genes among pair-wise subtypes were identified and then were used to explore the intrinsic relationship among the breast cancer subtypes.
Keywords/Search Tags:biological complex system, granular computing, granular space, signature gene
PDF Full Text Request
Related items