Methods based on Convolutional Neural Networks(CNN)are gradually replacing traditional object detection methods,thanks to their high accuracy and strong adaptability.These methods have found extensive applications in industrial environments,particularly in the effective detection of surface defects in steel strips.However,despite significant advancements,surface defect detection in steel strips still faces several challenges.For instance,the presence of various complex defect forms and texture variations in industrial settings can adversely affect the generalization ability of the models.Therefore,improving the efficiency of surface defect detection in steel strips through accurate and rapid identification models is an important area of research.This paper utilizes the YOLOX deep learning object detection algorithm to investigate the detection of surface defects on metal raw material steel strips,which are widely used in the industrial field.The research focuses on the following aspects:(1)To meet the real-time requirements for surface defect detection in the current industrial environment,a lightweight Cross Stage Partial Structure(L-CSP)is proposed based on the original CSP architecture.By compressing the channel numbers in bottleneck layers and decomposing the 3x3 convolutions used for feature extraction in the stacked residual structures into 3x1 and 1x3 convolutions,this structure reduces the model’s parameters and computational complexity.It enables the deployment of the model on mobile devices or other hardware platforms without compromising detection accuracy.Experimental results show that the L-CSP structure significantly reduces the model’s parameters and floating-point operations,reduces model size,and improves detection frame rate,with minimal impact on detection accuracy.(2)To address the issues of poor feature extraction quality and low detection accuracy in the detection model,a Multi-scale Feature Fusion Attention Module(MFFAM)is proposed.Three sets of feature maps with different resolutions are obtained from the backbone feature extraction network and passed into the feature enhancement module.Different scale receptive fields are utilized in the feature extraction modules to enhance feature representation.The features are then parallelly fed into spatial and channel attention modules to avoid mutual interference between the two modules.Although the proposed attention module introduces a slight increase in parameters,resulting in a marginal decrease in detection frame rate,the model integrated with the attention module exhibits a significant improvement in m AP compared to the baseline model.This effectively enhances the overall performance of the detection network,reducing false positives and false negatives.(3)In order to further reduce the model size and enable deployment on other heterogeneous platforms,this paper proposes model compression techniques for YOLOX,building upon the structural improvements.Firstly,a stage-wise convolutional kernel importance evaluation algorithm is proposed,considering the influence of both L1 and L2 norms.This algorithm utilizes structured pruning to reduce parameters and connections.Secondly,during the training process,low-precision computation is simulated by replacing high-precision values with low-bit-width values.After training,all weights are quantized to 8 bits.Test results demonstrate that pruning and quantization can significantly compress the model size and improve detection frame rate with only minimal loss in accuracy.Finally,a generic CNN acceleration IP core is designed and implemented on the ZYNQ platform.High-level synthesis tools are employed to design IP cores for convolutional and pooling layers.The convolution and pooling operations achieve several times faster processing speed compared to the ARM platform,while maintaining generality and versatility. |