Font Size: a A A

Identification Method And Visualization Of Potential Pattern System In Big Data

Posted on:2023-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:M M LiFull Text:PDF
GTID:2568307163489654Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In order to explore the hidden spatial characteristics and spatial relationships behind the data,people often build characteristic parameters suitable for this field based on their actual work experience and relevant professional theoretical knowledge,and conduct data analysis and decision-making based on professional parameters.Although professional parameters play an indispensable role in the process of data interpretation,from a data perspective,the constructed professional parameters ignore the information portrayed by the data itself to some extent.For this reason,this thesis takes seismic data volume as an example,quantifies the information existing in the data based on statistics and information theory,and uses techniques such as feature selection and cluster analysis to conduct knowledge mining and pattern identification of the information contained in the data.Promoting big data identification provides new ideas and methods.The specific work content is as follows:In order to eliminate the redundant features in the feature sets,this thesis proposes an unsupervised feature selection algorithm SCFS based on similarity and compactness.First,the algorithm uses the weighted k nearest neighbor density to calculate the similarity between features;then,it selects the cluster center and assigns the features according to the similarity;finally,a representative feature from each cluster is selected to form a feature subset according to the weighted redundancy and compactness of features.The algorithm is applied to 8 real datasets,and the experimental results show that the feature subset selected by this algorithm can obtain better clustering effect.In order to explore the internal structure behind the features,this thesis proposes a new clustering identification method MDI-Kmeans.This method mainly solves the problem that the traditional K-means clustering algorithm is easily affected by the initial cluster center and outliers.First,the method uses multiple clustering and density combination to determine the initial cluster center;then,it constructs an adaptive sample weighting matrix based on the isolated forest algorithm;finally,the adaptive sample weighting matrix is applied to the clustering iteration process to reduce the influence of outliers on clustering.The method is applied to 4 simulated datasets and 6UCI datasets.The experimental results show that the proposed method not only improves the performance of the algorithm,but also ensures the stability of the clustering results.In order to better quantify the information in the data,this thesis takes the seismic data volume as an example,and proposes a big data identification method based on statistics and information theory.First,the method uses statistics and information theory to construct features;then,it uses the proposed SCFS algorithm and MDI-Kmeans algorithm to achieve feature selection and feature clustering;finally,the feasibility of the method is verified by visualizing the clustering results.
Keywords/Search Tags:Statistics, Information theory, Unsupervised feature selection, Feature clustering, K-means
PDF Full Text Request
Related items