| As an important product of the steel industry,steel material is widely used in various fields.However,due to the limitations of production equipment and process conditions,surface defects such as scratches and crazing often appear in the production process of steel,which not only affect the appearance of the product but also affect the performance of the product.Therefore,it is important to detect surface defects in steel production and manufacturing.With the rapid development of artificial intelligence,deep learning has gradually become the mainstream method for many object detection tasks,and the detection algorithm takes into account the characteristics of intelligence,high accuracy,and real-time.Previous defect detection models are mostly implemented by Anchor-based,which requires a priori knowledge as an aid and relies on the designer’s knowledge,while the computational complexity is high and the tuning of parameters is complicated.In this thesis,DETR,an end-to-end detection method based on the Anchor-free idea,is used as a benchmark method to investigate an effective architecture for steel surface defect detection.Improvements are proposed for the initial DETR model with bad performance for small object detection,overly complex process of handling sparse features,and long training epochs,etc.The main work is as follows:(1)An improvement method is proposed for the backbone network in the frontend structure of DETR.Based on the multi-stage hierarchical architecture of the original backbone network ResNet,the combination with the feature pyramid network(FPN)is realized.The four stages of ResNet are connected with the stacked network transformation to build a stacked network containing multi-layer features.In order to further extract multi-scale features and better achieve the detection of small objects,a multi-receptive field fusion module is connected after each stage of ResNet.By setting up a comparison experiment with the initial DETR backbone network,it is verified that the backbone network combined with the FPN can more fully extract the global features of the image data.The comparison experiments are also conducted for small object class defects,and it is demonstrated that the improved model can better handle the small object detection task.(2)Position information processing using a position encoding generator(PEG).The initial DETR position encoding uses the encoding method of pre-defined functions in order to reduce the scale of parameters,which lacks flexibility in processing the practical data.In this thesis,we propose dynamic position encoding(DPE),which can maintain the translation invariance of the model and reduce the computational workload by conditioning the local domain of the input information.DPE can also adapt to arbitrary input sizes and support the processing of high-resolution images.The above is the improvement for the front-end structure of DETR,and the effectiveness of the two improvement methods is proved by comparative experiments.After that,an improved method for the back-end structure based on the Transformer part of the initial DETR is proposed:(3)Based on the idea of deformable convolution and Deformable DETR,the attention mechanism in Transformer is changed to deformable attention mechanism and then combined with the front-end improvement structure to obtain the final SPAttentionDETR.After connecting with the DETR front-end improvement method,2-layer initial DETR encoder and 4-layer improved DETR encoder stack are used in the encoder part.In the decoder section,a 6-layer improved DETR decoder stack is used.By comparing the experiments with SOAT models such as Deformable DETR,an improvement of 1.2% in the total class mAP is achieved.Provides an effective detection architecture for steel surface defect detection. |