| In recent years,artificial "deflagration" accidents have occurred frequently in densely populated areas and often accompanied by fire and smoke.Considering the necessity and urgency of deflagration accident detection at present,this thesis will be based on the target detection technology in the field of deep learning and take YOLOv5(You Only Look Once Version 5),the representative one-stage target detection model,as the object to study the detection performance of algorithm of multi-scale deflagration targets in depth.In order to optimize the multi-scale detection capability of the algorithm,this thesis proposes four optimization methods in terms of algorithm mechanism and network structure for the shortcomings of YOLOv5 in multi-scale performance.Finally,streaming media technology is used to improve network model for receiving real-time video stream of monitor and output real-time video stream screen.First of all,in terms of the microscopic algorithm mechanism,since the detection scale of the YOLOv5 network model is limited to the 80 types of objects in the MS COCO dataset,its performance in multi-scale detection is limited.This thesis is based on CBAM which combine the Channel Attention Mechanism and the Spatial Attention Mechanism.The module improves the performance of the Convolution Module,and combines the Short Cut strategy to design the SCConv algorithm mechanism.The SCConv module is placed in the neck network to improve the feature fusion capability at multiple scales.The experimental results show that compared with YOLOv5 s,m AP is improved by 0.3% and precision is improved by 2%.In addition,in the feature extraction stage of the backbone network,this thesis uses the Swin Transformer mechanism of ICCV 2021 best paper in the vision transformer field to optimize the feature extraction capability of the backbone network.The YOLOv5 s after integrating the Swin Transformer technology has a 0.9% increase in m AP in performance,and the precision is improved by 3%.Secondly,in terms of macroscopic network structure,the YOLOv5 structure still uses the earlier proposed PAN structure for feature extraction and fusion.This thesis adopts the latest Bi FPN(Bidirectional Feature Pyramid Network)structure on YOLOv5 s,and the m AP is increased by 0.8%.And further improve the Bi FPN network structure to strengthen the multiscale target detection ability,design the AS-Bi FPN(All Scale Bidirectional Feature Pyramid Network)structure suitable for multi-scale environment,the performance is further improved and the m AP is improved by 1.0% on the basis of the Bi FPN structure.On the basis of YOLOv5 s,the m AP has increased by 1.8%,which is close to the detection performance of YOLOv5 m.In addition,this thesis studies the optimal parameters of the network structure in the case of multi-scale and designs a network model that can more effectively deal with large and small scales based on the performance of YOLOv5.In the large target scene,the maximum downsampling ratio is increased to 64 times,and the maximum number of channels is 1536 channels,which improves the deep feature fusion level,the comprehensive performance m AP is improved by 2.0%,and the precision is improved by 4.5%;in the small target scene,the maximum number of channels is reduced by 768 channels and a shallow feature fusion level is added,the m AP is improved by 2.0% in the detection performance of small scale targets,and the precision remains unchanged.Finally,this thesis combines the two designed models of small and large scale into a whole model through the model fusion strategy(Model Embedding),and designs an alarm mechanism on the model,and uses the network model to infer the video stream of the monitor through streaming media technology. |