| The clinical data contains a lot of valuable information,which is of great significance for doctors to clearly diagnose and treat diseases.However,clinical data in real life cannot obtain a large number of clinical samples due to confidentiality,incompleteness,a small number of rare disease samples,and objective reasons such as difficulty in obtaining clinical data category labels for some difficult diseases.The clinical data classification problem caused by these reasons is a typical small sample data classification problem.However,the classification model of small sample clinical data trained by traditional classification algorithm has low classification performance and cannot meet the needs of real life.In order to further improve the classification performance and provide an effective auxiliary diagnosis method for clinical diagnosis.This paper studies the small sample clinical data of different diseases and proposes a small sample clinical data classification method of data amplification and collaborative classification.The main achievements of this paper are as follows: 1.Starting from the small amount of clinical data samples,in order to obtain a large number of samples,a data amplification method based on gaussian mixture model was proposed.By estimating the gaussian mixture distribution of existing clinical data,a large number of virtual data with category labels are generated,namely the amplification data,which can provide a large number of data support for the following classification tasks.2.Two classification algorithms are proposed under the idea of "collaborative classification of data amplification" : the first is the classification algorithm based on data amplification: a large amount of amplified data is generated from clinical training data through data amplification,and then the amplified data and clinical training data are formed into a new training set to train the traditional supervised classification model.The second is the data amplification collaborative semi-supervised cyclic random forest classification algorithm(DA-SSCRF): through the theoretical and experimental analysis of the first algorithm,it is found that the error of the category label given by the data amplification will lead to the degradation of classification performance.Therefore,in order to label the amplified data with high-confidence category labels,this paper introduces the semi-supervised learning idea.The clinical training data is used as the labeled data,and the amplified data obtained by the clinical training data is used as the unlabeled data,finally proposed the small sample classification problem under the background of a semi-supervised cyclic random forest classification algorithm.By constructing a semi-supervised classification model,the amplification data can play a role in enhancing classification performance.3.Through the validation of the clinical datasets of the eight diseases,the accuracy of DA-SSCRF classification algorithm was improved by 3% to 11% compared with the supervised classification algorithm without data amplification and other semi-supervised classification algorithms with data amplification.4.In order to prove the practicability of the DA-SSCRF algorithm,the DA-SSCRF algorithm was applied to the clinical data set of meningitis disease from a top three hospital.In this paper,a 10-dimensional features of meningitis clinical dataset was selected from the original 52-dimensional clinical information by the coefficient selection method based on the coefficient of variation.The experimental results also showed that had a 3% improvement in the diagnosis accuracy of meningitis disease type,the diagnostic rate of the two types of tuberculous meningitis and cryptococcal meningitis diagnosed by clinicians increased by 6% and 10%,respectively.The DA-SSCRF algorithm can achieve rapid and efficient diagnosis of meningitis through the clinical information of 10-dimensional meningitis features,which is of great significance for the diagnosis of meningitis disease types.In summary,this paper proposes corresponding solutions to the classification of small sample clinical data,which effectively improves the accuracy of disease diagnosis,and is of great significance for assisting doctors in disease diagnosis. |