| With the rapid development of the times,the way people live and work is changing in the direction of digitization,informatization,and intelligence.A large amount of data is generated every moment and transmitted in the form of streams in the network,forming data stream.Real-time mining of potential information in data streams and utilization is an important way to improve the efficiency of data value conversion.Clustering is one of the important forms of data stream mining,which can realize the statistical analysis of unlabeled data and provide decision-making for other methods.support.The data stream is dynamic,and the phenomenon of concept drift caused by it will reduce the performance of the computational model,which puts forward a new requirement for data stream clustering,that is,concept drift detection.Most of the concept drift detection algorithms in existing research work in supervised environments and are not suitable for clustering learning.In addition,the lack of prior knowledge,the uncertainty of the number of clustering clusters,and the limitation of computing resources are all issues that need to be considered in data stream clustering.Given the above problems,this paper studies the concept drift detection strategy and adaptive clustering algorithm in the data stream environment based on statistical theory and clustering analysis methods.The specific contents are as follows:(1)A concept drift detection algorithm based on clustering partition is proposed to realize the concept drift detection of the unlabeled data stream.The essence of concept drift is that the data distribution changes.In the unsupervised learning environment,the algorithm first implements a uniform partitioning algorithm based on clustering,partitions the data and forms a histogram representation,and uses statistical hypothesis testing methods to analyze the partitions.The histogram is used for statistics and calculations to determine whether the data distribution has changed,to achieve the purpose of concept drift detection.(2)A partition-based data stream adaptive clustering algorithm is proposed,which can reduce the impact of concept drift on clusters.The algorithm firstly implements the method of estimating the number of clusters k.The estimation range is determined by observing the distribution of data characteristics,and the optimal k value is obtained through multiple tests to ensure the quality of clustering.In addition,based on the PH test method,two stages of concept drift early warning and drift warning are set to realize the concept drift detection in the continuous clustering process,and adaptively adjust the cluster changes caused by concept drift in time.(3)For the above work,extensive experiments are carried out on different data sets,and explanations and application analyses are carried out in combination with actual cases.The experimental results show that the performance of the proposed concept drift detection algorithm in an unsupervised environment is better than that of the current mainstream detection algorithms,and the detection quality is improved by about 5%;the proposed data stream clustering algorithm can adapt to the concept drift phenomenon in the data stream,and the performance Better than the current mainstream clustering algorithms of the same type of data stream,the clustering quality is improved by about 15%,and the performance is stable.In practical application cases,the proposed data stream clustering algorithm can form effective clustering and adaptive adjustment of real-time street data streams,and the clustering results can be used as the basis for judging road conditions and provide good decision support.The above content shows that the research results in this paper can effectively detect the concept drift in the data stream in an unsupervised environment,achieve good clustering of the data stream,and have a certain application value. |