Font Size: a A A

The Fuzzy Clustering Algorithm Research Based On Cancer Gene Data

Posted on:2023-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2544306848481354Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Since modern medical development,due to the rapid development of computer technology,data storage technology and data mining technology,the diagnosis and treatment of disease has also played a great role.Cancer tumors have always been one of the killers of human life.As the quality of life improves,the incidence of cancer increases year by year,but the discovery and diagnosis of cancer is still unsophisticated.The occurrence of cancer is due to the characteristics of its own cells and the special characteristics of the tumor itself,resulting in cancer is not easy to be diagnosed;Secondly,cancer data processing is also a major problem at present.There are few cancer data samples,and the distribution of true and false samples is extremely unbalanced.Meanwhile,the samples have the characteristics of super-high attributes,resulting in difficulties in data analysis,data processing and diagnosis by non-artificial methods.Therefore,the method of processing and analyzing cancer tumor data with the aid of artificial intelligence and data mining technology came into being.In the field of data mining,it is used to explore the availability of the original data,and then the discovered data information is used for clustering.Through this method,cancer data can be integrated and analyzed to achieve better clustering effect.Firstly,the clustering method is used to find the specific subtypes of cancer and provide targeted treatment.Secondly,data analysis technology will be used to analyze drugs and tumor genes to find out the associated drugs and provide personalized treatment to prolong the survival time of patients with various cancer subtypes.This study aims to provide a reliable approach to the clustering of cancer tumor data.Due to the interaction of cancer data,it is impossible to classify into a single category.Therefore,this study uses nuclear fuzzy C-means clustering KFCM to conduct research.The research directions and contents are as follows:The kernel fuzzy C-means clustering(KFCM)algorithm needs to specify the number of data classifications artificially,and its performance will be reduced due to the sensitivity of data noise,and the mutual influence of edge data points will lead to classification errors.To solve these problems,an improved C-KFCM fuzzy algorithm was proposed in this study.The Canopy coarse clustering algorithm was first used to give the rough classification number of data sets,and the KFCM algorithm was then used for clustering.Meanwhile,this study improves the membership function of the original KFCM algorithm,and introduces the average membership value of its domain data into the membership degree of noise points and edge data,so that the influence of noise in the data on the algorithm is reduced or disappeared.Finally,the experiment shows that C-KFCM can automatically determine the number of classifications,and compared with the original algorithm,C-KFCM improves the average accuracy and the clustering effect is more stable.In order to solve the problems of kernel fuzzy C-means clustering algorithm in cancer gene clustering,such as the relationship between data cannot be deeply mined and the algorithm performance is affected by outlier data points.A new algorithm KFCM-S,which combines kernel fuzzy C-means clustering algorithm with spectral clustering algorithm,is proposed.It uses spectral clustering to give the KFCM algorithm a better ability to find the inter-relationship between data.Secondly,the membership function of the kernel fuzzy C-means clustering algorithm is improved to reduce the influence of outliers and abnormal data points on the algorithm performance and make the algorithm more robust.Experimental results show that the improved kernel fuzzy C-means clustering algorithm can better deal with the relationship between data,and the performance of the algorithm is more stable.In this study,two algorithms,C-KFCM and KFCM-S,are proposed to improve the original algorithm to different degrees,and are verified by experiments.In the experimental part,artificial data and cancer tumor data sets were selected for the experiment,and different clustering algorithm evaluation methods were adopted.Through experiments,the feasibility and stability of the proposed algorithm are fully demonstrated in the artificial data set,and the performance and accuracy of the algorithm are slightly improved by the evaluation function.In the real data set of cancer tumors,the algorithm has stronger clustering analysis ability and the ability to discover the internal relationship of data,which plays a greater role in the guidance and treatment of modern medicine.
Keywords/Search Tags:Oncogenetic Data, Fuzzy Clustering Algorithm, KFCM, Spectral Clustering, Canopy
PDF Full Text Request
Related items