| Currently,gastroscopy plays a crucial role in the field of gastrointestinal malignancy diagnosis due to its wide application.However,since the recognition of gastroscopic images relies on manual identification by physicians,this can easily lead to misdiagnosis and missed diagnosis.For example,some gastroenterologists may overlook abnormal signals when diagnosing early gastric cancer due to inexperience or inaccurate diagnosis of the location and type of lesions,thus missing the best time for treatment.Deep learning has a wide range of applications in the medical field,and the multi-classification task of gastroscope images is one of them.This work is also worthy of in-depth research.To address the above problems,this thesis investigates the deep learning-based multi-classification method for gastroscopy images,and the main work is as follows:(1)To adapt to the characteristics of gastroscopy image datasets and to optimize the performance capability and robustness of the network model in various situations,this thesis proposes a convolutional neural network model approach incorporating SimAM modules.The method utilizes the features that the attention mechanism can filter and weight the feature maps,strengthen task-relevant features and suppress taskirrelevant features,and achieve the performance improvement of the convolutional neural network model without increasing the number of model parameters by selecting and optimizing the target features.The experimental results show that adding SimAM to the convolutional neural network can effectively improve the prediction accuracy of the model.(2)To solve the problem of fixed perceptual field of Vision Transformer(ViT),this thesis proposes a Mobile SiT network model method that integrates SiT and Mobile ViT by borrowing the idea of Swin Transformer(SiT).By replacing the ViT module in the Mobile ViT Block with the SiT module,the model can achieve information sharing between layers,thus improving the accuracy of model prediction.Experimental results on the processed Kvasir dataset show that the accuracy of the Mobile SiT network model is significantly improved compared to that of VGGNet,Mobile Net V3,Google Net,Mobile ViT,and residual network(Res Net)models.In this thesis,using classification accuracy as the evaluation index of experimental results,the proposed method based on Mobile SiT modified by Mobile ViT and incorporating the attention mechanism achieves 98.57% accuracy on the processed Kvasir dataset,which is a significant performance improvement compared with the benchmark method. |