| With the continuous research on deep learning technology,more and more theories have been put into application,and autonomous driving technology is one of the representatives.How to accurately perceive the driving environment around the driving tool,such as vehicles,pedestrians and obstacles,has become a major problem restricting the development of current autonomous driving technology.The purpose of perception is to transmit information about the environment around the vehicle to the autonomous driving system,which areas are passable or not.The most widely used perception method is indirect perception.The driving situation map is constructed by obtaining the distance,speed,shape and other information of surrounding objects.With the development of artificial intelligence technology,it is one of the current mainstream development trends to construct a mapping relationship between sensor data and driving behavior through deep learning,which directly works on the driving operating system.However,the high complexity of road scenes is a problem faced by autonomous driving perception systems.With the development of deep learning,the use of computer vision solutions based on deep learning technology can greatly reduce the cost of autonomous driving perception systems.Therefore,this paper mainly studies the deep learning method of road scene instance segmentation,and introduces the relevant theories in detail.Instance segmentation is a computer vision task that requires predicting object instances and their individual pixel segmentation masks.The current main instance segmentation models are limited by the performance of object detectors or the ability to learn and process pixels.Faced with frequently-occurring occlusion objects and small-scale objects in road scenes,the segmentation effect is poor.Based on the analysis of the above problems,this paper proposes an instance segmentation model with multi-scale fusion attention mechanism based on the SOLOv2 algorithm network,which enhances the model’s ability to express multi-scale features.The model performs well in the Cityscapes data set of road scenes.The specific work of this paper is as follows:(1)A multi-scale fusion attention mechanism is designed,and multi-scale features are constructed by convolution of different scales for channel grouping,which can effectively extract more fine-grained multi-scale spatial information,and can establish a more efficient channel dependencies.(2)Using a weighted bidirectional feature pyramid network to assign weights to the input features,strengthen the feature layer’s ability to extract multi-scale features,and improve the feature fusion method,thereby reduce the computational cost.(3)Designing dynamic convolution.A convolution kernel with fixed parameters is converted into a convolution kernel that can be adaptively changed by input using the attention mechanism,and multiple convolution kernels are dynamically superimposed.Through the grouping method,the expressive ability of the network is enhanced without increasing the computational cost too much.In this paper,comparative experiments and ablation experiments are carried out on the Cityscapes,and the accuracy of the designed model and some mainstream instance segmentation methods is compared.The average accuracy of the method proposed in this paper on Cityscapes reaches 38.8%,which is 3.5 percentage points higher than the original SOLOv2 model,and the performance is also better than the original model and the comparison model.Therefore,the method in this paper can provide a certain reference for instance segmentation techniques for road scenes. |