Font Size: a A A

Glaucoma Classification On Imbalanced Data Distribution

Posted on:2023-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:X L ChenFull Text:PDF
GTID:2544307070983849Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,computer-aided diagnosis plays an important role in glaucoma classification.However,due to the serious imbalance of data distribution in clinical samples,computer-aided diagnosis methods have ’ model bias ’.Since there are fewer glaucoma samples and more normal samples in the training data,the model tends to predict glaucoma as normal.This will make the model obtain a higher false-negative rate and lower classification accuracy,and lead to an early screening of glaucoma patients who missed the best treatment period.Based on the current research,we find that the data imbalance problem can be divided into the following three aspects,including hard samples,imbalanced feature distribution,and limited labeled samples in the medical field.First of all,hard samples refer to a small number of complex retinopathy samples in glaucoma categories,which belong to tail samples.The model is difficult to identify hard samples in the tail class because it is affected by the head class.Secondly,the visualization results show that even if hard samples are accurately identified,there is still a large feature space occupied by normal and a small feature space occupied by glaucoma,which leads to the decision boundary of the model being biased towards normal and causes the problem of imbalanced feature distribution.Finally,the two problems mentioned above are mainly solved from the aspect of feature dimension transformation of labeled samples,which does not change the original imbalanced data distribution.Therefore,how to effectively increase the number of samples to change the data distribution has become a major problem of limited labeled samples in the medical field.Based on the above analysis,we propose three glaucoma classification methods to solve the problem of imbalanced data,improve the classification accuracy of glaucoma and reduce its false-positive rate and false-negative rate.Therefore,the research carried out in this thesis includes the following three aspects.(1)Given the traditional classification methods can not solve the problem of hard samples in imbalanced data,we propose an evidence-guided curriculum learning classification method by re-weighting hard samples.This method uses the evidence map to guide the adaptive curriculum module to accurately identify the hard samples.It re-weights the loss and features of hard samples so that the model pays more attention to the learning of hard samples.Finally,this method helps the model to accurately classify the hard samples,improve the classification accuracy of the model,and reduce the false-positive rate.The experimental results show that the sensitivity in the LAG and RIM-ONE datasets is 97.1 % and 91.6 %.(2)Given the two problems of imbalanced feature distribution and hard samples,we propose a self-ensemble dual-curriculum learning glaucoma classification method by re-balancing feature representation.The method uses the teacher-student network structure to transfer the feature representation of the head class to the tail class to re-balance the representation.In order to better make the model identify the hard samples of glaucoma,this method strengthens the identification and processing of the hard samples through the designed curriculum module,to improve the accuracy of the classification of the hard samples of glaucoma.The final experimental results show that this method has great advantages under severe imbalanced data distribution.Even in the case of an extreme imbalance ratio of 1413: 1,the sensitivity of this method on LAG is increased by 24.07 % compared with the baseline model.(3)Given the problem of limited label samples and imbalanced feature distribution in the medical field.We propose a semi-supervised curriculum learning glaucoma classification method.In order to obtain a re-balanced feature distribution,this thesis selects a group of high-confidence unlabeled data to participate in the re-balancing training of labeled data.The self-supervised regularization module is used to extend the case of insufficient representation of tail classes.The re-balanced semi-supervised learning paradigm uses a dynamic adaptive loss function to make the model from imbalance to re-balance.The experimental results show that the accuracy of the model on LAG is increased by 20.45 % when only 50% labeled data and the imbalance ratio reaches 100: 1,and the accuracy on Tissue MNIST-LT,Path MNIST-LT,Oct MNIST-LT,and Derma MNIST-LT medical datasets is increased by 1.53 %,1.57 %,3.20 %,and 0.25 %,respectively.
Keywords/Search Tags:Data Imbalanced, Curriculum Learning, Semi-supervised Learning, Glaucoma Classification
PDF Full Text Request
Related items