| With the rapid growth of high-dimensional data,clustering has become an increasingly popular analysis method.Among various clustering methods,graph clustering,attribute graph clustering,and biclustering have become effective techniques for different types of data analysis problems.Metabolism is one basic biological process that exists in all biological organisms.Studying flux distribution over a metabolic network is essential for understanding the structure and function of metabolic networks.Flux distribution over a metabolic network can be consider as a special case of attributed graph.Samples having consistent changes in a subnetwork suggest a potential metabolic shift happened in a sample group.To detect such patterns,it is necessary to consider subspace structure of the attributes over the network as well as the network structure,i.e.,an integrative task of biclustering and attributed graph clustering.In this thesis,we propose an algorithm based on attributed graph biclustering(AGBC)for clustering high-dimensional metabolic flux data.The algorithm identifies subnetworks with a consistent increase of flux by integrating network structural information into the biclustering process.The AGBC algorithm combines the advantages of attribute graph clustering and biclustering methods.First,an attribute graph representing metabolic flux data is constructed,and then a method of Boolean matrix decomposition is used to embed the structural information of the attribute graph while biclustering the properties of the edges of the attribute graph.The bidirectional growth algorithm and weak signal detection algorithm are used to iteratively locate submatrices with the same properties.To validate the effectiveness and efficiency of the AGBC algorithm,we conducted experiments on simulated and real metabolic flux data on human central metabolic network data(M171 network).Simulated data sets were generated using a flux generation model based on message passing optimization algorithms and compared with other biclustering algorithms.The results showed that the AGBC algorithm outperformed other biclustering algorithms in terms of reconstruction loss and efficiency.For the real data set,the metabolic network factor graph was first constructed,flux distribution was then estimated,and the results were discretized before clustering.The experiments showed that the AGBC algorithm is more accurate in identifying subnetworks with the same properties and more efficient in terms of computation time.Moreover,the AGBC algorithm can handle large-scale metabolic network flow data with high precision and efficiency.The experiments showed that the AGBC algorithm outperformed other biclustering algorithms for metabolic network.First,the AGBC algorithm is more accurate in identifying subnetworks with the same properties.Second,the AGBC algorithm is more efficient in terms of computation time.Third,the AGBC algorithm can handle largescale metabolic network flow data with high precision and efficiency.The AGBC algorithm is of great significance for future research on human core metabolic networks.While the AGBC algorithm has shown promising results in the experiments,there are still some limitations and challenges that need to be addressed.For example,the parameter selection and result density,and the data discretization method needs further optimization.In future work,the algorithm should be improved based on the characteristics of other flux data,such as transportation network flow,to present good clustering results on other flux data for further analysis.is currently based on extensive experimental data.Inaccurate models or noisy data may lead to inaccurate results.Moreover,the AGBC algorithm is computationally intensive,and it may be challenging to apply it to large-scale metabolic networks.In summary,the AGBC algorithm is a promising method for clustering highdimensional metabolic network flow data.By integrating structural information into the biclustering process,our algorithm can accurately identify subnetworks with the same properties and efficiently handle large-scale metabolic network flow data.The AGBC algorithm can contribute significantly to the understanding of metabolic pathways and diseases,as it provides a powerful tool for analyzing metabolic flux data.Future research may focus on developing automatic parameter selection methods and exploring different data discretization methods to address the limitations of the AGBC algorithm. |