Research On Uncertainty Measure And Algorithms For Data With Imbalanced Density

Posted on:2023-06-11

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Liu

Full Text:PDF

GTID:2558306932463794

Subject:Mathematics

Abstract/Summary:

In the era of big data,the research has generalization of uncertainty measure is more and more important.In this paper,machine learning algorithms such as feature selection,classification,regression analysis and clustering analysis are studied for data with imbalanced density distribution based on relative distance measurement method.The main research contents are as follows.First,the distance measurement are summarized,the analysis found that the relative distance measure can effectively reduce the data density distribution imbalance to the effects of distance measurement.Second,the sensitivity of classical fuzzy rough set model to data distribution is analyzed.In order to solve this problem,a relative uncertainty measure is proposed by combining the relative distance measure with the fuzzy lower approximation.When measuring the uncertainty of the classification of samples,this index considers the local distribution of samples,and reduces the influence of sample density on the uncertainty measurement.Then,relative fuzzy dependency is defined based on the proposed uncertainty measure,and a feature selection algorithm based on relative fuzzy dependency is designed.Experimental results show that the uncertainty measurement based on relative distance is effective and efficient.Thirdly,the irrationality of the classification hyperplane obtained by the classical SVM algorithm when the data class density distribution is imbalanced is analyzed.To solve this problem,this paper introduces relative distance measurement into SVM model and proposes an improved SVM model,namely the standard SVM model.The experimental results show that the standard SVM model can obtain an effective and explicable classification hyperplane when the data class density distribution is unbalanced.Then,it is found that the imbalanced distribution of data density has a great influence on the least square regression model,which makes the established regression model have some irrationality.In order to solve this problem,this paper introduces relative distance metric least-squares regression method of regression model of the improved model is put forward.Experimental results show that the improved least square method can effectively adapt to the imbalanced density distribution data,and the prediction accuracy is high.Fourthly,the sensitivity of k-means algorithm to data distribution is analyzed,and the results show that the difference of cluster density seriously affects the clustering effect of k-means algorithm.In this paper,relative distance measurement is introduced into k-means clustering and an improved k-means clustering model is proposed.The improved model takes full account of cluster density information and uses relative distance to divide samples.The experimental results show that the relative distance can effectively improve the clustering effect and make the improved algorithm have a certain generalization.

Keywords/Search Tags:

Relative distance measurement, Relative uncertainty, Density distribution, Feature selection, Prediction and classification model

Related items

1	Research On Frequency Tunability And High-precision Relative Distance Measurement Technology Based On Optoelectronic Oscillator
2	Image Relative Attribute Learning And Application
3	Research On Support Vector Data Description Based On Relative Density Degree
4	Non-uniform Data Clustering Method Based On Relative Density
5	Single-image Signal-dependent Noise Parameter Estimation Method Based On Relative Density Peak Clustering
6	A supervisory intelligent robot control system for a relative pose-based strategy
7	A Texture Feature Extraction Method Based On Frequent Itemsets In Relative Phase Domain And Its Application In Image Classification
8	Research Of Feature Selection Based On Evolutionary Algorithms
9	Research On GPS/BDS Precision Relative Positioning Technology
10	Ensemble Clustering Using Maximum Relative Density Path