| Object detection based on deep learning aims to locate the area and identify the type of object.It is a crucial technology of the automatic driving system.In the era of artificial intelligence,with the development of computer hardware,object detection is widely used in various embedded devices,such as contactless robots,intelligent monitoring systems,automatic driving systems of automobiles,etc.In terms of machine vision,this technology has played a considerable role.In automatic driving,2D object detection and 3D object detection can be used to perceive the surrounding scene.In 2D small object detection,small objects take up fewer pixels,and the network is too deep,resulting in loss of position information,so it is challenging to extract features.In addition,many proposals need to be generated,resulting in slow detection.2D object detection is strongly influenced by lighting conditions,whereas 3D object detection based on LIDAR point clouds is relatively unaffected by lighting.Hence,3D object detection is beneficial for autonomous driving to perceive the surrounding scene at night.However,the attitude and position of the object are more complex and harder to predict in 3D space.Meanwhile,the point cloud is unstructured data,making it more difficult to extract features.Therefore,two methods are proposed in this thesis to address these drawbacks,for 2D small object detection and 3D object detection,so the research conducted in this thesis includes the following aspects.(1)Aiming at the network generates many useless regions of interest(ROI),resulting in slow speed.This research proposes a two-stage object detection framework that locks the ROI at the coarse granularity and then locates the object at fine granularity.This method uses a fusion of low-level features with high-level features using a feature pyramid network and a path enhancement network to reduce the loss of location information and improve detection accuracy.This method uses an integrated tagging approach for the first object detection,focusing on regions where objects are likely to be present,with each ROI containing several objects,effectively reducing the number of regions of interest and speeding up network detection while also solving the truncation problem of the second stage slicing approach.In the second stage,all regions of interest are uniformly sized through ROI pooling,and the second stage detection is carried out in parallel to speed up the detection speed of the network.(2)Aiming at 3D point cloud unstructured data is difficult to extract features using convolutional neural networks.This research proposes extracting point cloud features using Point Net++ and a self-attention module,effectively retaining the original point cloud coordinate information.This research presents a shell-based modeling approach to improve accuracy.The coordinates are first roughly determined to which spherical shell they belong.The results are then refined to actual values,thereby narrowing the localization range and improving detection accuracy.To improve the recall of the 3D object detection bounding box,a 3D object detection self-attention module with a skip connection structure is designed in this research.Some of these features are highlighted by weighting on the feature dimension.After training,the feature weights favorable for object detection are scaled up.As a result,the extracted features are more suitable for object detection. |