| With the rapid development of information technology,the scale and depth of the database have been continuously improved,accumulating huge amounts of data in different storage forms.The stock industry is undoubtedly a data-intensive industry with daily transaction data and other relevant data reaching the GB(Gigabyte,1 billion bytes)level,and long-term stock trading data can be considered massive and unlimited.Behind the vast amounts of data,hidden in a variety of valuable information.As an important branch in data mining,the purpose of clustering analysis is to study the similarities between data and to divide similar data into the same category.According to the different value of each stock,the stock with similar value is clustered into one category by clustering so as to grasp the general tendency of stock and judge the potential value of stock.Stock trading data,especially stock market data are real-time,continuous,and constantly changing over time.These characteristics are typical characteristics of streaming data,so,the stock market data analysis and mining should use streaming data mining technology.Therefore,this paper selects the clustering algorithm of streaming data to cluster the stock market transaction data.Firstly,the article reviews the existing applications of the traditional clustering algorithm in the stock market and the relevant literature of streaming data clustering algorithm.Secondly,this paper describes the flow data clustering technology in detail.It mainly includes three models of stream data: time series model,cash register model and turnstile model;four common summary data structures: vector,archetype array,core tree and grid;four commonly used mobile window technologies: Landmark window,sliding window,decay window and tilt time window.Then this paper analyzes the background and status quo of the application of streaming data clustering in stock market analysis.According to the characteristics of stock market data,through the research of some latest achievements in data mining of convection,the advantages and disadvantages of each result are compared and the density-based data flow Clustering algorithm D-Stream,and the concept of the algorithm,ideas and other aspects of a comprehensive introduction.Finally,this article selects the transaction data of 50 stocks of Shanghai Stock Exchange,selects the transaction price and the volume as the clustering index.The online part of the D-Stream algorithm used in this paper maps each input data record to the grid,the offline part calculates the grid density,and aggregates the grid based on the density,and adopts density attenuation technology to capture the dynamic changes of the data stream.For the data stream,at every time interval,the online part of D-Stream continuously reads new data records,puts the multidimensional data into corresponding discrete density grids in the multi-dimensional space,and updates the feature vectors of the density grids.In this paper,the attenuation factor is set to 0.01,the grid size is set to 0.2,the grid density is set to 10,and the complex relationship between attenuation factor,data density and clustering structure is used to generate and adjust the cluster effectively and effectively in real time.And analyze the result of the last clustering.The results show that the stock under study can be well distributed to the financial sector,infrastructure and heavy industry,manufacturing industry and high-tech and service industries in these industries,the financial industry,infrastructure and heavy industry,manufacturing industry groups exist Ignoring the relevancy,shares in these three groups belong to the traditional industries,accounting for the majority of the overall number,accounting for the main part of the overall market value.At the same time,according to the result of clustering,the paper realizes the clustering of the value and makes a preliminary forecast on the future price trend of each stock.If the stock is lower than the relative value,the price will have a rising tendency;if the stock is higher than the relative value,Down trend.Through reasonable clustering analysis of stocks and real-time analysis results to investors,it can help investors to understand and grasp the general characteristics of stock accurately,determine the scope of investment,and predict the stock price through various types of overall price level Changes in trends,users can meet the real-time query and analysis of the stock market demand,choose a favorable investment opportunity. |