With the development of high throughput sequencing technology,a large number of cancer gene expression data have emerged.These data cover gene expression data for different types of cancer,which provides great help for cancer research and treatment,and also brings great challenges to the analysis and processing technology of cancer gene data.The challenges of cancer gene data analysis include issues such as high redundancy,polymorphism,high dimensionality,and high noise in data.Therefore,efficient data processing techniques are needed to extract useful information and value from gene data.Cluster analysis,as a common data mining technique,has been applied to the analysis of cancer gene expression data.Due to the complexity of cancer gene expression data,many common clustering algorithms do not perform well in their clustering analysis,and highdimensional data blurs the boundaries of data differentiation.So fuzzy clustering with the idea of fuzzy set is more suitable for cluster analysis of cancer gene expression.Therefore,this paper improves the clustering algorithm to address the problems existing in fuzzy clustering processing of cancer gene expression data.The main research contents are as follows:(1)Aiming at the problem that fuzzy clustering depends strongly on the initial clustering centers and easy to fall into local optimal solutions,a fuzzy clustering algorithm combined with Cauchy distribution and ant lion algorithm(CALOFCM)is proposed.Firstly,the Cauchy distribution function variant ant lion optimization algorithm is introduced,which reduces the binding force of individuals by local extreme points,thus increasing the probability of escaping from the local optimum.Secondly,the elite ant lions generated by the optimized ant lion algorithm are used as the initial clustering centers of the Fuzzy C-Means(FCM)algorithm.Finally,the comparison experiments of UCI data sets and cancer gene expression data sets show that compared with k-means,DBSCAN,FCM,ALOFCM algorithm,the proposed algorithm can escape from the local optimum and obtains better clustering effect.(2)Aiming at the characteristics of large amounts of cancer gene expression data and large amounts of redundant information.This paper proposes a weighted cancer gene fuzzy clustering algorithm based on Fisher linear discrimination(FLDAFCM).Firstly,Fisher linear discriminant analysis is introduced.And the contribution rate of the attribute to the sample data is determined using the Fisher linear discriminant rate.Then calculate the weight formula and improve the fuzzy clustering algorithm.Finally,experimental verification was conducted on the UCI dataset and the cancer gene expression dataset.And compared with FCM,DBSCAN,and the CALOFCM and FLDAFCM algorithms proposed in this paper.The experimental results show that the weighted cancer gene fuzzy clustering algorithm combined with Fisher linear discrimination has better clustering effects on high-dimensional data,and through the clustering analysis of the data set,the clustering results with medical value are obtained. |