| Object detection in aerial image is a challenging task.Although many advanced methods based on the convolutional neural network were popular in natural scenes,the progress in aerial images is not so smooth.Unlike natural scenes,objects in aerial images have the characteristics of arbitrary orientation,densely distribution,and large scale variation,which leads to a series of problems such as feature misalignment,missed detection,and poor detection of large aspect ratio objects.The corresponding improvements made in this thesis to address the above problems are as follows:(1)Considering the missed and false detection that are likely to occur when densely distributed and arbitrary-orientation objects in natural images are detected by horizontal bounding box detection method,oriented bounding box with a certain angle offset relative to the horizontal bounding box is adopted.The oriented bounding box only contains the corresponding target between adjacent bounding box in a densely distributed scene,and the IoU will not be too large.(2)An improved multi-directional two-stage cascaded R-CNN method is proposed to detect small objects with arbitrary directions in remote sensing images better.First,convert the horizontal region of interest obtained by the RPN(Region Proposal Network)into a rotated region of interest.The rotated region can better extract the features of the object to be detected,and then a location regression is performed on the rotated region of interest for more accurate location.In the first-level region conversion network,a multi-orientation RoI Align is designed to obtain orientation-sensitive features from multiple horizontal regions of interest with different rotated angle.In the meantime,an orientation attention module is adopted to assign the weight to each direction channel adaptively in the regression branch,which enhances the orientation-sensitivity of the feature.(3)Aiming at the feature misalignment problem caused by the location change of candidate region,a multi-branch feature alignment module based on deformable convolution is adopted to resample the features,and at the same time,the dilated convolution with different expansion rates is used to obtain receptive field with different scales.In addition,an angle offset penalty loss function based on the object aspect ratio is proposed to alleviate the problem that large aspect ratio targets are more sensitive to angle offset.In the training process,more attention is paid to the learning of the angle offset of large aspect ratio objects.The ablation experiments and comparative experiments with other advanced methods on the two public datasets of DOTA and HRSC2016 verify the effectiveness of the algorithm proposed in this thesis. |