Medical Data Set Filling And Classification Based On Machine Learning

Posted on:2021-03-17

Degree:Master

Type:Thesis

Country:China

Candidate:X Q Chen

Full Text:PDF

GTID:2494306107962569

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

In recent years,information technology,big data technology and machine learning have made great progress,and the concept of healthy China has gradually been popularized.A large number of rich medical data can provide potentially valuable information,and the application of machine learning methods to medical data set has gradually become a research hotspot,which can help relevant medical staff to improve disease diagnosis and the efficiency of the break and the relief of some of the patients’ treatment pain.Medical data set may have missing values due to the operational errors of data collection personnel or the limited technical measurement.Therefore,this paper mainly aims to solve the problem of missing values in medical data set,select several reasonable missing value filling methods to fill the missing values in medical data set,and then use the classification algorithm in machine learning to establish the appropriate model for the help of recognizing and diagnosing epileptic seizures.First of all,this paper introduces several methods to deal with missing values in data sets,such as mean filling method,mode or median filling method,k nearest neighbor filling method and this paper points out the advantages and disadvantages of each filling algorithm at present.Then based on the correlation degree of each feature attribute in the data set,this paper proposes a new distance measurement method,which mainly calculates the Pearson correlation coefficient between each feature,and adds it as a form of weight to the calculation method of Euclidean distance.It improves the distance measurement method of k nearest neighbor filling algorithm.At the same time,due to the uncertainty of K value,The K value selection method of k nearest neighbor filling method is proposed.A scale coefficient is set to extract k nearest samples within the scale coefficient.Then,after the data preprocessing methods including missing value processing,abnormal value processing and normalization processing,three different methods of feature selection and model combination are used to establish the appropriate epilepsy patient recognition model,mainly including single variable feature selection and random forest combination algorithm,recursive feature selection and random forest combination algorithm and SVM model.The results show that SVM model is better than other two models in accuracy,precision,recall,F1 value,AUC value and so on.Finally,although this paper only studies the processing and classification of missing values in medical datasets,we can use these methods to deal with missing values in other datasets for reference.Reasonable and effective processing of missing values in datasets can help us to dig out the potential information in datasets and improve the utilization efficiency of datasets.

Keywords/Search Tags:

missing value filling, medical data, machine learning, distance measurement, auxiliary diagnosis

PDF Full Text Request

Related items

1	Research On Intelligent Medical Diagnosis Auxiliary Method Based On Machine Learning
2	The Research On Auxiliary Diagnosis Technology Of Liver Cirrhosis And Its Complications Based On CECT Image
3	Research On Medical Data Imputation Method Based On Stacking Ensemble Learning
4	Research On Medical Data Classification Algorithm Based On Machine Learning
5	Research On Missing Data Interpolation In MIMIC Database Based On Machine Learning
6	Research On Missing Imputation For Medical Data
7	Research On Medical Assistant Diagnosis Of Breast Cancer Based On Machine Learning
8	Research Of Intelligent Hepatopathy Auxiliary Diagnosis System Based On Text Semantic Analysis Of Electronic Medical Records
9	Pre-processing And Quality Analysis Of Medical Big Data Based On Machine Learning
10	A Research On Strategy For Mining Clinical Modifiable Factors And Handling Its Missing Data In Electronic Health Record