Font Size: a A A

A Classification Model For Gene Expression Data Of Cancer Patients

Posted on:2024-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:2544307127453834Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Cancer is a debilitating disease characterized by uncontrolled growth and division of abnormal cells.Available treatments,such as chemotherapy,radiotherapy,and surgery,can cause significant physical and psychological pain.Mutations in key genes are known to contribute to cancer development,highlighting the importance of studying gene expression data.The advancement of sequencing technologies has led to the production of high-quality gene expression data,which is vital for exploring biological,medical,and disease mechanisms.Machine learning algorithms can be utilized to investigate the relationship between gene expression data and cancer progression,leading to personalized and precise treatment plans for patients.However,analyzing cancer gene expression data is challenging due to small sample sizes,high dimensionality,high noise,and class imbalance.In this study,osteosarcoma and gastric cancer are chosen as representative cancers,combining their unique characteristics with common gene expression dataset features to explore and study various aspects:1)This paper proposes a novel classification model for the classification of osteosarcoma gene expression data based on Weighted Multi-Source Data Fusion(W-MSDF),Excitation-based Convolution Neural Networks,and Support Vector Machines(E-CNN-SVM).The data processing stage uses an improved weighting mechanism inspired by multi-view algorithms to fuse feature extraction information from different data sources,which increases the intrinsic connectivity of small sample data and alleviates the problem of insufficient data volume.Furthermore,this paper proposes the E-CNN-SVM classification algorithm by combining convolutional neural networks and support vector machines,with an incentive mechanism that enhances the weight of core features and improves the performance in classifying small sample data,inspired by squeeze and excitation networks.The experimental results demonstrate that our proposed model can effectively improve the classification accuracy of osteosarcoma gene expression data.2)This paper proposes a classification model,Wavelet Threshold Denoising-Random Forest(WRF),Sample Expanding(SE),and Excitation-based Stacked Autoencoder(ESAE),for the classification problem of gastric cancer DNA methylation data.In the data processing stage,a noise reduction autoencoder is used to randomly destroy some gene fragments to expand the number of training samples and enhance the robustness of the model to prevent overfitting.The WRF algorithm is also utilized to improve the feature selection ability of the model.In the classification module,E-SAE is used to suppress the function of samples with low importance.Experimental results demonstrate that the proposed model can effectively improve the classification performance of gastric cancer DNA methylation data and avoid overfitting.3)Building on the gene expression data classification model proposed earlier for patients with osteosarcoma and gastric cancer,this paper develops intelligent classification software for patients with these diseases in practical medical scenarios.The system is designed and implemented based on the PyQt framework and Python language,following the determination of its requirements and feasibility.By leveraging the gene expression data classification model,the software enables medical staff to efficiently classify and diagnose patients with gastric cancer and osteosarcoma,reducing their workload and improving overall efficiency.
Keywords/Search Tags:Small sample, Osteosarcoma, Gastric cancer, Gene expression data, Excitation
PDF Full Text Request
Related items