| Traffic scenarios are complex and changing.Equipping cars with safe and reliable assisted driving systems can greatly reduce the occurrence of traffic accidents and improve the efficiency of urban traffic operations.Experienced drivers can quickly locate salient areas in a scene due to selective attention mechanism.They can filter out redundant information and extract key information related to driving activities.Therefore,imitating human selective attention mechanism to predict salient regions or object in traffic scenes is of great research significance for the development of assisted driving systems.In this paper,we investigate the driver attention prediction method in complex traffic scenarios,aiming at predicting the driver’s attention region or detecting the salient object.In this paper,the full mining of spatial features,extraction of motion information and estimation of salient objects are explored,and relevant models and methods are proposed to effeectively model the driver’s selective attention mechanism.The main contents and results of this paper are as follows.(1)From the perspective of scene parse,this paper transforms the driver attention prediction task into a problem of classifying regions in traffic scenes in terms of saliency and non-saliency.Based on the deep learning approach,this paper proposes a driver attention prediction model based on a fully convolutional neural network.And a combination of multiple loss functions is utilized to optimize the network model.The videos in complex traffic scenes are input,and the regions of driver attention in the scenes are predicted and compared with the traditional computational model,which fully proves the effectiveness of the method.(2)In this paper,we propose Salient Object-Guided Attention Fusion Network for drivers(SOGAF-Net).The method firstly addresses the problem of missed and false detection of small-sized objects.Based on a foll convolutional network,a salient object-guided module is proposed to effectively fuse the high-level semantic features of the object detection network.This module can obtain semantic features containing salient objects of different scales.Then the spatial features extracted from the backbone network are fused with the semantic information of salient objects to enhance the weights of salient objects.Finally,the spatio-temporal information is extracted from the fused feature maps by the Conv-LSTM network.The SOGAF-Net can capture the salient temporal features within consecutive frames.The SOGAF-Net enhances the detection accuracy of salient small-size objects and ensures the temporal coherence of the prediction results.(3)In this paper,we propose the Adaptive Short-Temporal Features Induced Aware Fusion Network(ASTAF-Net).Firstly,for the reason that the salient motion information is difficult to be captured,a dynamic feature extraction module(DFM)is proposed.And the DFM use spatial features between inter-frames to compute significant motion information.Then,in the spatial feature extraction part,a correlation analysis cell is proposed to enhance the contextual information by combining different perceptual field convolutions.Finally,for the problem of excessive redundant regions in the prediction results,this paper annotates the driver attention objects in the TDV databases and DADA-2000 databases.Based on the object detection algorithm,the branch of object saliency estimation is proposed to predict the object-level attention.The ASTAF-Net for driver attention prediction can find salient regions and salient objects in traffic scenes.The proposed method is trained and tested on several databases.Compared with a large number of advanced models,this paper obtains the quantitatively analyzation.The experimental results fully demonstrate that the proposed method in this paper can effectively predict the driver’s attention region under traffic scenes. |