Lung Cancer Classification Prediction Based On Machine Learning Method

Posted on:2023-12-10

Degree:Master

Type:Thesis

Country:China

Candidate:F C Wang

Full Text:PDF

GTID:2544306614985209

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

According to the latest cancer statistics report,my country’s lung cancer mortality ranks first,accounting for one-fifth of cancer deaths.With the development of the country’s economy and society,the incidence of cancer in China is changing from a developing country to a developed country,and the high incidence of lung cancer is particularly prominent in males.Most of the confirmed cases are lung adenocarcinoma,lung squamous cell carcinoma and small cell lung cancer.Different lung cancer types have their own specific treatment methods,so it is necessary to accurately understand the patient type before treatment and prescribe the right medicine.At present,the detection methods of lung cancer types in clinical use are mostly invasive methods such as puncture and surgical extraction of tissue.The invasive methods have the risk of complications and adversely affect the treatment of lung cancer patients.With the geometric growth of data information,a large amount of medical data provides the possibility for digital diagnosis.It is of great significance to establish a complete set of non-invasive lung cancer subtype prediction models as an auxiliary diagnosis and treatment method.Based on the data of admitted lung cancer patients recorded in a domestic tertiary hospital,this paper proposes a non-invasive lung cancer type diagnosis scheme,which uses machine learning methods to predict lung cancer subtypes.The main contents include the following aspects:(1)Select appropriate data preprocessing methods according to the characteristics of medical data.Medical data has the characteristics of clutter,irregular data records,serious data missing,and even missing sample labels.These problems have caused great difficulties for the construction of classification models.This paper uses the K nearest neighbor imputation method to preprocess the missing values and solve the problem of missing data.Secondly,medical data has imbalanced data due to its initial probability of onset.In this paper,the SMOTE oversampling method is used to balance the data.(2)Based on different methods,the optimal feature subset is selected.The data in this paper contains more than 60 indicators such as patient diagnostic information,laboratory indicators,and chronic disease history.Different machine learning models require different characteristics.In this paper,three major categories and five subcategories of feature selection methods are selected for feature extraction,including filtering methods(Correlation Coefficient Method,Mutual Information Method,Relief-F,etc.),wrapping methods(Forward Selection,Backward Selection,Global Search)and embedded methods(LASSO,Ridge Regression).(3)A prediction model of lung cancer classification based on machine learning method is proposed.In this paper,three machine learning methods are selected,including support vector machines,random forests and probabilistic neural networks,combined with feature selection methods to build predictive models.In this paper,the precision rate,recall rate and AUC value are selected as evaluation indicators,and finally the random forest combined with Relief-F feature selection method has better prediction effect.

Keywords/Search Tags:

Machine learning, Lung cancer classification, Feature selection, Data cleaning

PDF Full Text Request

Related items

1	Research On Medical Data Classification Based On Machine Learning
2	Breast Cancer Diagnosis Based On Feature Selection And Support Vector Machine
3	Classification Of Cancer Subtypes Based On Gene Expression Data
4	Application Research Of Improved Deep Extreme Learning Machine In Lung Cancer TCM Syndrome Classification
5	Research On Feature Modeling Method Of Survival Prognosis And Tumor Staging Of Lung Adenocarcinoma Based On Machine Learning
6	Fundamental Theory And Application Study On Large For Gestational Age Infants Using Machine Learning Techniques
7	Classification Of Brain Diseases Based On Multimodal Imaging And Machine Learning
8	Prognosis Research Of Lung Adenocarcinoma Based On Machine Learning
9	Application Of Machine Learning In Cancer Diagnosis
10	Prognostic Model Of Breast Cancer Patients Based On Feature Selection