| In the era of big data,the research has generalization of uncertainty measure is more and more important.In this paper,machine learning algorithms such as feature selection,classification,regression analysis and clustering analysis are studied for data with imbalanced density distribution based on relative distance measurement method.The main research contents are as follows.First,the distance measurement are summarized,the analysis found that the relative distance measure can effectively reduce the data density distribution imbalance to the effects of distance measurement.Second,the sensitivity of classical fuzzy rough set model to data distribution is analyzed.In order to solve this problem,a relative uncertainty measure is proposed by combining the relative distance measure with the fuzzy lower approximation.When measuring the uncertainty of the classification of samples,this index considers the local distribution of samples,and reduces the influence of sample density on the uncertainty measurement.Then,relative fuzzy dependency is defined based on the proposed uncertainty measure,and a feature selection algorithm based on relative fuzzy dependency is designed.Experimental results show that the uncertainty measurement based on relative distance is effective and efficient.Thirdly,the irrationality of the classification hyperplane obtained by the classical SVM algorithm when the data class density distribution is imbalanced is analyzed.To solve this problem,this paper introduces relative distance measurement into SVM model and proposes an improved SVM model,namely the standard SVM model.The experimental results show that the standard SVM model can obtain an effective and explicable classification hyperplane when the data class density distribution is unbalanced.Then,it is found that the imbalanced distribution of data density has a great influence on the least square regression model,which makes the established regression model have some irrationality.In order to solve this problem,this paper introduces relative distance metric least-squares regression method of regression model of the improved model is put forward.Experimental results show that the improved least square method can effectively adapt to the imbalanced density distribution data,and the prediction accuracy is high.Fourthly,the sensitivity of k-means algorithm to data distribution is analyzed,and the results show that the difference of cluster density seriously affects the clustering effect of k-means algorithm.In this paper,relative distance measurement is introduced into k-means clustering and an improved k-means clustering model is proposed.The improved model takes full account of cluster density information and uses relative distance to divide samples.The experimental results show that the relative distance can effectively improve the clustering effect and make the improved algorithm have a certain generalization. |