| Object detection is one of the important research topics in the field of computer vision.Traditional fully supervised target detection methods require a large amount of training data with target category labels and target boundary boxes to test and train the model.Such instance level annotation data requires a large amount of time and labor costs.Therefore,it is important to explore weak supervised target detection using only image level annotation as training data.In this paper,a weak supervised object detection method based on candidate region clustering learning is improved to improve the performance of weak supervised target detection.The feasibility of its application is verified through training and testing of high-level monitored image data sets.(1)Aiming at the local focus problem of weakly supervised object detection methods in single object image detection,this paper proposes a weakly supervised object detection algorithm based on class activation maps and spatial maps.Due to the lack of accurate supervision information,weakly supervised target detection is mostly processed by multi-instance learning,and multi-instance learning itself has limitations,that is,the non-convexity problem,which makes the detection results prone to local focus,which in turn affects the performance of weakly supervised target detection.performance.In this paper,by introducing the class activation map and the spatial map,the position of the target in the image and the spatial context information of the target are fully mined,so as to guide the network to select a better cluster center,optimize the clustering results of the candidate area,and improve the candidate area optimization branch.performance,improving the local focus problem of weakly supervised object detection.(2)Aiming at the problem that the weakly supervised target detection method has too large detection frame or missed detection in multi-target image detection,this paper proposes a multi-branch thinning weakly supervised target detection algorithm.By introducing three network branches: spatial information branch,category counting branch and category label semantic information branch,the potential information of images can be fully mined.The spatial information branch uses the class activation map and the spatial map to capture the spatial context information of the target,the category counting branch uses the target quantity information as pseudo-supervisory information,and the category label semantic information branch uses the word-skipping model to mine the semantic dependencies between category labels.The information obtained by these branches is used as pseudo-supervised information,which improves the insufficient supervision of weakly supervised target detectors,enables the network to more accurately mine the target objects in the image,and improves the performance of the detector. |