| The common user interface in mobile applications interacts with the application through a specific language and framework.For a screenshot of an application,the developer can convert it into the code and continue to develop on this basis.UI designers usually take the screenshot as an example and use vector drawing tools to reconstruct the user interface.With the continuous diversification of user interfaces,it is difficult for designers to directly determine the location coordinates and categories of components with the naked eye,which makes the work of reconstructing UI screenshots very difficult.Therefore,it is extremely important to design an automated inspection method for user interface component inspection,and it has guiding significance for designers to reconstruct UI screenshot examples.At present,traditional methods have been used to identify user interface components,but they cannot be classified,and the detection process of traditional computer vision technology is very time-consuming.In recent years,deep learning has gradually entered people’s field of vision.Researchers have proposed many target detection algorithms based on deep learning for the detection of various scenes.These algorithms also provide new ideas and new technologies for the detection of user interface components.In view of the current problems in user interface component detection,this paper proposes using the target detection algorithm based on deep learning for component detection.The task of component detection includes identifying the component location and classification.YOLOv3 is a classic one-stage object detection algorithm,but through experiments,it is found that the detection accuracy of directly using it for UI component detection is very low.Therefore,this article has made corresponding improvements to YOLOv3.The first is the design of the backbone network.To accurately identify and locate the target,it is necessary to obtain a feature map with rich information.In this paper,Dense Net is selected as the backbone network.The densely connected structure allows the extracted features to be fully used.On this basis,a network containing five dense blocks was designed to extract features.In addition,channel attention mechanism and spatial attention mechanism are introduced to improve the interior structure of the dense block to further enhance the feature extraction ability of the backbone network.Then there is the detection stage.UI components tested in this paper have the problem of unbalanced size.Therefore,this paper designs a multi-scale detection structure that combines features of different scales,and chooses to implement multi-scale prediction on four feature maps of different scales,thereby improving the performance of the network.The last is the loss function design.Because a large number of candidate frames will be generated in the detection stage,most of which are not responsible for detecting the target,which leads to too many negative samples to dominate the network training process.To solve this problem,the Focalloss is used as the classification loss function to reduce the contribution of negative samples to loss.This paper conducts experiments on the real UI data sets.The collected data is preprocessed and the format of the data set is reconstructed.In this paper,experiment by setting multiple sets of Focalloss parameter values to select an optimal set of parameters.A comparative experiment is set up on two aspects: the classic target detection algorithm based on deep learning and the traditional detection method.Recall,precision and m AP are selected as evaluation indicators,and m AP is used to measure the overall performance of the model.The experimental results show that the recall of the method in this paper reaches 76.35%,and the m AP reaches 55.43%,and the detection effect is better than other target detection algorithms.Compared with the traditional detection method,the method in this paper has increased the component classification function,and the method in this paper is far superior to the traditional detection method in recall and precision.Experimental results show that the proposed method is more effective in the detection of user interface components. |