Today,after the COVID-19 pandemic,people attach great importance to public health security.Facing sudden outbreaks of infectious diseases,rapid response measures are required.As one of the most effective physical barriers to pathogens,wearing a mask can play a good two-way protection role.However,some people are not satisfied with the epidemic prevention and related measures,and still do not wear masks in areas with high population density.For example,the use of manual supervision and management to standardize the wearing of masks often consumes a lot of human resources and is not efficient.Therefore,it is particularly important to use an efficient and accurate mask wearing detection technology instead of manual work.This paper takes the target detection algorithm as the core,adds multi-head attention mechanism-Transformer and multi-scale feature fusion for achieving efficient mask wear detection.And improves YOLOv5(You Only Look Once version 5)algorithm for the mask wearing detection task in high-density and complex scenes.The main research of this paper contents include the following parts:(1)Aiming at the problem that the network detection accuracy is not high in the high-density small target gathering scene,a method of introducing attention mechanism is proposed to enhance the network’s extraction of important features of face and mask,so as to improve the detection accuracy.Add SE(Squeeze-and-Extension Networks),CBAM(Convolutional Block Attention Module)and Transformer attention mechanism modules to YOLOv51 network respectively for comparative experiments.The results show that the introduction of attention mechanism can improve the detection performance of the network to a certain extent.Among them,YOLOv51+Transformer model can achieve better detection effect.While the network convergence speed is faster,the accuracy rate,recall rate and mean average precision mAP(mean average precision)have increased by 3.8%,2%and 3.6%respectively,and the reasoning time of single 640 × 640 picture has decreased by 3.89 ms.In high-density scenes,good detection results can be achieved.(2)An improved feature fusion method is proposed to solve the problem of insufficient target feature extraction in YOLOv51 downsampling process.By replacing the original PANet(Path Aggregation Network)structure in YOLOv51 network with BiFPN(Bidirectional Feature Pyramid Network)structure,selective multi-scale bidirectional feature fusion is realized.The experimental results show that the YOLOv51 network with BiFPN can improve the accuracy,recall and mAP by 2.2%,2%and 3.9%respectively,and can effectively solve the problem of missing detection in high-density scenarios.(3)Finally,a multi-module improved YOLOv51 network based on Transformer+BiFPN is proposed,and the detection ability of the model is verified by experiments.The experimental results show that the improved network can make the accuracy rate,recall rate and mAP reach 90.7%,97.0%and 88.2%respectively,and the detection speed reaches 49.44ms for a single image,and the model performance reaches the best in this study.On the basis of this optimal network,an interactive desktop application is designed and developed by PyQt5 tool,which enables users to realize the function of image detection and video real-time detection,and achieves a high detection accuracy.At the same time,the detection results are displayed through the visual interface. |