Research And Application Of Classification Of Small Sample Clinical Data

Posted on:2020-12-22

Degree:Master

Type:Thesis

Country:China

Candidate:Y Kang

Full Text:PDF

GTID:2404330602452347

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The clinical data contains a lot of valuable information,which is of great significance for doctors to clearly diagnose and treat diseases.However,clinical data in real life cannot obtain a large number of clinical samples due to confidentiality,incompleteness,a small number of rare disease samples,and objective reasons such as difficulty in obtaining clinical data category labels for some difficult diseases.The clinical data classification problem caused by these reasons is a typical small sample data classification problem.However,the classification model of small sample clinical data trained by traditional classification algorithm has low classification performance and cannot meet the needs of real life.In order to further improve the classification performance and provide an effective auxiliary diagnosis method for clinical diagnosis.This paper studies the small sample clinical data of different diseases and proposes a small sample clinical data classification method of data amplification and collaborative classification.The main achievements of this paper are as follows: 1.Starting from the small amount of clinical data samples,in order to obtain a large number of samples,a data amplification method based on gaussian mixture model was proposed.By estimating the gaussian mixture distribution of existing clinical data,a large number of virtual data with category labels are generated,namely the amplification data,which can provide a large number of data support for the following classification tasks.2.Two classification algorithms are proposed under the idea of "collaborative classification of data amplification" : the first is the classification algorithm based on data amplification: a large amount of amplified data is generated from clinical training data through data amplification,and then the amplified data and clinical training data are formed into a new training set to train the traditional supervised classification model.The second is the data amplification collaborative semi-supervised cyclic random forest classification algorithm(DA-SSCRF): through the theoretical and experimental analysis of the first algorithm,it is found that the error of the category label given by the data amplification will lead to the degradation of classification performance.Therefore,in order to label the amplified data with high-confidence category labels,this paper introduces the semi-supervised learning idea.The clinical training data is used as the labeled data,and the amplified data obtained by the clinical training data is used as the unlabeled data,finally proposed the small sample classification problem under the background of a semi-supervised cyclic random forest classification algorithm.By constructing a semi-supervised classification model,the amplification data can play a role in enhancing classification performance.3.Through the validation of the clinical datasets of the eight diseases,the accuracy of DA-SSCRF classification algorithm was improved by 3% to 11% compared with the supervised classification algorithm without data amplification and other semi-supervised classification algorithms with data amplification.4.In order to prove the practicability of the DA-SSCRF algorithm,the DA-SSCRF algorithm was applied to the clinical data set of meningitis disease from a top three hospital.In this paper,a 10-dimensional features of meningitis clinical dataset was selected from the original 52-dimensional clinical information by the coefficient selection method based on the coefficient of variation.The experimental results also showed that had a 3% improvement in the diagnosis accuracy of meningitis disease type,the diagnostic rate of the two types of tuberculous meningitis and cryptococcal meningitis diagnosed by clinicians increased by 6% and 10%,respectively.The DA-SSCRF algorithm can achieve rapid and efficient diagnosis of meningitis through the clinical information of 10-dimensional meningitis features,which is of great significance for the diagnosis of meningitis disease types.In summary,this paper proposes corresponding solutions to the classification of small sample clinical data,which effectively improves the accuracy of disease diagnosis,and is of great significance for assisting doctors in disease diagnosis.

Keywords/Search Tags:

Small sample, Clinical data, Amplification data, Gaussian Mixture Model, Semi-Supervised Learning, Random Forest, Classification

PDF Full Text Request

Related items

1	Classification Of DFU Images Based On Deep Semi-Supervised Learning
2	Research On Tumor Histopathological Image Analysis Method Based On Deep Semi-Supervised Learning
3	Research On The Key Issues Of Small Sample Classification And Class Imbalance Classification In Medical Image Aided Diagnosis
4	Research On Hybrid Algorithm Of Medical Insurance Fraud Detection Based On Random Forest
5	Reserch On The Classification For Tumor Genomics Data
6	Automatic Classification And Recognition Of Peripheral Blood Leukocytes Based On Semi-supervised Learning
7	Classification In Clinical Assistive Diagnosis Based On Small Sample Size Data And Feature Learning
8	Glaucoma Classification On Imbalanced Data Distribution
9	Semi-self-supervised Learning Method Based On Semantic Text Similarity Of Small Sample Electronic Medical Record
10	Analysis Of Semi-Supervised Learning Algorithm Oriented Disease Prediction