| Saliency detection is an important pre-processing step in image processing.In recent years,various detection models have emerged.Traditional detection models mostly rely on contrast and prior knowledge,and its limited computing power can affect salient results,especially for images with low contrast and complex environments,which usually fail to detect.The deep detection models can achieve better results due to its local receptive field and parameter sharing characteristics as well as powerful feature extraction and learning abilities,but there are also some problems,such as coarse detection results and the loss of important information.Therefore,starting from the pixel-level deep network,this paper uses the region-level traditional method,the region-level loss and the optimization of the network to refine the detection results and compensate for the loss of information,so as to improve the detection accuracy.The main work and contributions are as follows:(1)For images with low contrast and complex environment,a saliency detection model based on multi-layer cellular automata is proposed,which combines global and local information.Firstly,the encoder-decoder network based on convolutional neural network is supervised and iteratively trained to extract the global features of images.The encoder and decoder structures use VGG-16 and its symmetric network for feature extraction and reconstruction.Secondly,the pixel-level global feature is used to guide the coding of superpixel features.For the global saliency map,the foreground and background codebooks are generated by the adaptive thresholds,and the two codebooks are respectively encoded by the locality-constrained linear coding method,and the both are fused to generate the local saliency map with local detail information.Finally,through the stable posterior probability of Bayesian theory of the multi-layer cellular automata framework,the extracted global and local saliency maps are fused to produce the final saliency map containing both global contour information and local detail information.The experimental results show that the average F-measure value of our model on the ECSSD,DUT-OMRON and PASCAL datasets increased by 1.8%,1.7%,and 0.8%,respectively,compared to the global salient results,and the corresponding Mean Absolute Error(MAE)value is 0.129,0.129 and 0.171,respectively.It shows that our model has better accuracy and generalization performance.(2)Due to the receptive field structure and the pooling operation of the deep network,the loss of object boundary and high-level information of the network will be caused,and these losses are irreversible.In the deconvolution process,salient objects cannot be reconstructed well.Therefore,a U-Res-Net detection network combining U-Net and ResNet is proposed,and adaptive affinity loss is incorporated.Firstly,the U-Net framework is used to build the network structure.Between the convolution and deconvolution parts,skip connections are added to directly transfer the high-level information of the network to the lower layers,which can enrich the low-level information,realize multi-scale feature fusion,and reduce the loss of high-level information.Secondly,the convolution and deconvolution parts of the network adopts ResNet-50 and its symmetric network,which increases the number of network layers and extracts more global abstract features without the loss of precision and is trained easily.Finally,the loss of the pixel-level is extended to the loss of the region-level,the spatial structure relationship between the pixels is integrated,and the object boundary and the interior are considered separately,thereby strengthening the boundary information,making the object boundary clearer,and making the internal more aggregated.The adaptive kernel sizes can deal with the images with different object sizes.The experimental results show that the average F-measure value of the network on the MSRA 10 K,ECSSD,DUT-OMRON,HKU-IS,THUR 15 K and XPIE datasets is 93.4%,89.6%,77.9%,87.5%,72.2% and 82.2%,the MAE value is 0.044,0.073,0.060,0.059,0.089 and 0.076,respectively,which further improved the detection accuracy. |