Cancer is currently the most threatening disease to human life and health.It is highly complex and heterogeneous.Research on cancer by researchers and medical experts has been going on for decades.Compared with other cancers,breast cancer,as a more common type of cancer,has the second highest incidence rate in the global cancer ranking after lung cancer.The detection and identification of cancer subtypes is an important part of cancer detection,clinical prognosis and recuperation,which has key practical value in providing specialized and accurate treatment plans for cancer patients.Cancer polyomics data usually has the characteristics of higher dimensions,greater noise and fewer samples.These characteristics also pose great challenges to traditional data mining and analysis methods.At present,cancer subtypes are mainly studied through the function of individual molecules and related systems or networks,but the function of a single sample is incomplete,and the relevant systems or networks will also change with time and conditions.It can be seen that the specific network of estimating a sample plays a crucial role in the understanding of complex diseases from a single cancer sample.In recent years,the progress of machine learning algorithms in big data analysis has been unparalleled,especially the excellent performance of some deep learning algorithms in data analysis.However,at present,a single sample has not been used for network construction,based on the individual specific network construction level for research,using The network distance between samples is used to cluster different cancer subtypes.Based on the network construction distance,this paper uses an individual-specific network to cluster the breast cancer data in the TCGA database.It mainly studies the singlesample network construction and the breast cancer subtype clustering model based on this method,and conducts a survival analysis of the distance model obtained by this method to prove its effectiveness.At the same time,it also explores the research of preprocessing,feature selection and clustering methods of multiple groups of cancer data.The main work of this paper mainly includes the following two points: First,a network based on individual specificity,that is,a single-sample network model,introduces the theoretical basis of this method and the learning of network distance.The results show that this method can introduce any two samples,generate network distance,and use the network distance between the two samples to cluster single cells,identify subtypes,and obtain clustering results.The effectiveness of this method in subtype clustering is verified by survival analysis.Second,a clustering model based on the distance of a single-sample network is proposed.The data mainly used comes from the TCGA database,and the network construction and clustering analysis are carried out after data preprocessing and feature selection.Using hierarchical clustering to classify normal samples and tumor samples,the accuracy rate is as high as 98%,which proves the feasibility of single-sampling-specific network clustering.Results In this paper,more than 1,000 breast cancer samples were compared and analyzed using two different clustering methods,which were divided into five cancer subtypes.The validity and feasibility of the clustering results were verified through survival analysis.The clustering results are visualized with PCA and t-SNE to further verify the effectiveness of the clustering results. |