| Ensuring the stable and efficient operation of coal-fired boiler equipment is an important issue in industrial informatization.Cluster analysis,as a typical unsupervised learning model,can effectively divide data based on the similarity between data.Therefore,cluster analysis can be used to partition unlabeled coal-fired boiler data according to operating conditions,thereby guiding the production and operation of coal-fired boilers.Moreover,due to the large volume and high dimensionality of coal-fired boiler data,a single clustering algorithm cannot effectively process it.A single clustering algorithm cannot adapt to complex and irregular data distributions,and it has significant shortcomings in terms of partition accuracy and robustness.Therefore,the resulting operating condition partition is not ideal and cannot guarantee that the partition results are reasonable and effective.Ensemble clustering generates multiple basic clustering results for a dataset,and selects the best quality results as ensemble members.The final clustering result is obtained through consistent fusion.However,traditional ensemble clustering methods have high time complexity and are difficult to analyze large-scale datasets.When processing large-scale and high-dimensional datasets,the results obtained by traditional ensemble clustering methods are often less than ideal in terms of accuracy and robustness.To address these issues,this thesis proposes an improved ensemble clustering method tailored to the characteristics of coal-fired boiler data for clustering analysis,thereby achieving coal-fired boiler operating condition partitioning.This article tackles the challenge of high time complexity and partitioning difficulties associated with traditional ensemble clustering methods when dealing with large-scale datasets.To overcome these issues,a mixed representation nearest neighbor approximation method is proposed to extract similarity information from the data,construct multiple sparse affinity sub-matrices between the data rapidly and effectively,and subsequently generate multiple base clustering results using a graph segmentation method.This enables the data to be partitioned into multiple clusters,allowing for more efficient and accurate analysis.Secondly,to prevent poor-quality base clustering results from adversely affecting the consistency fusion results,it is crucial to screen and select high-quality base clustering results.This article proposes a basic clustering evaluation index based on cluster stability to measure the quality of base clustering.Additionally,a selection system based on this evaluation index is designed to screen and identify high-quality base clustering results that are suitable for consistency fusion.This approach helps to prevent the negative impact of poor-quality clustering results on the final ensemble clustering results.Lastly,a consistency fusion method based on three-order tensors is presented to combine the selected multiple base clustering results.Leveraging the power of tensor decomposition to extract hidden information from the data,this approach facilitates the mining of complementary information between the base clustering results,leading to improved quality clustering partition results.To evaluate the effectiveness of the proposed method,this article conducts a comparative analysis with other single clustering algorithms and advanced ensemble clustering algorithms using 10 large-scale datasets,including five real UCI datasets and five artificially synthesized datasets.The results demonstrate that the proposed method is significantly more effective for large-scale modular data partitioning.Finally,the algorithm is applied to a real-time monitoring system for the working conditions of coal-fired boilers,enabling ensemble clustering of coal-fired boiler status data.This approach provides early warnings for workers and facilitates boiler parameter correlation chain queries,allowing workers to adjust the parameters of coal-fired boiler process objects and ensure stable and efficient boiler operation. |