Identification Method And Visualization Of Potential Pattern System In Big Data

Posted on:2023-06-11

Degree:Master

Type:Thesis

Country:China

Candidate:M M Li

Full Text:PDF

GTID:2568307163489654

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In order to explore the hidden spatial characteristics and spatial relationships behind the data,people often build characteristic parameters suitable for this field based on their actual work experience and relevant professional theoretical knowledge,and conduct data analysis and decision-making based on professional parameters.Although professional parameters play an indispensable role in the process of data interpretation,from a data perspective,the constructed professional parameters ignore the information portrayed by the data itself to some extent.For this reason,this thesis takes seismic data volume as an example,quantifies the information existing in the data based on statistics and information theory,and uses techniques such as feature selection and cluster analysis to conduct knowledge mining and pattern identification of the information contained in the data.Promoting big data identification provides new ideas and methods.The specific work content is as follows:In order to eliminate the redundant features in the feature sets,this thesis proposes an unsupervised feature selection algorithm SCFS based on similarity and compactness.First,the algorithm uses the weighted k nearest neighbor density to calculate the similarity between features;then,it selects the cluster center and assigns the features according to the similarity;finally,a representative feature from each cluster is selected to form a feature subset according to the weighted redundancy and compactness of features.The algorithm is applied to 8 real datasets,and the experimental results show that the feature subset selected by this algorithm can obtain better clustering effect.In order to explore the internal structure behind the features,this thesis proposes a new clustering identification method MDI-Kmeans.This method mainly solves the problem that the traditional K-means clustering algorithm is easily affected by the initial cluster center and outliers.First,the method uses multiple clustering and density combination to determine the initial cluster center;then,it constructs an adaptive sample weighting matrix based on the isolated forest algorithm;finally,the adaptive sample weighting matrix is applied to the clustering iteration process to reduce the influence of outliers on clustering.The method is applied to 4 simulated datasets and 6UCI datasets.The experimental results show that the proposed method not only improves the performance of the algorithm,but also ensures the stability of the clustering results.In order to better quantify the information in the data,this thesis takes the seismic data volume as an example,and proposes a big data identification method based on statistics and information theory.First,the method uses statistics and information theory to construct features;then,it uses the proposed SCFS algorithm and MDI-Kmeans algorithm to achieve feature selection and feature clustering;finally,the feasibility of the method is verified by visualizing the clustering results.

Keywords/Search Tags:

Statistics, Information theory, Unsupervised feature selection, Feature clustering, K-means

PDF Full Text Request

Related items

1	The Research And Application Of Clustering Feature Selection Methods
2	Research On Feature Selection Method Based On Three-way Decisions Theory And Feature Clustering
3	Study Of Feature Selection Based On Information Theory
4	Research On Robust Fuzzy Clustering Algorithm Based On Feature Selection
5	Research On Feature Selection Algorithm Based On Similarity
6	Research On Unsupervised Feature Selection Method Based On Regularized Regression Model
7	Precise Clustering Algorithm For Chinese Text Based On K-means
8	Research On Feature Selection Algorithms Based On Structure Information Of Samples And Features
9	Research On Unsupervised Balanced Feature Selection
10	Research And Application Of Spectral Clustering Algorithm