| Visual perception has always been the main way of perception for humans to contact the external world,where images and videos,as the main carriers of external information,have become the basis for humans to explore the world.Segmenting primary objects of images and videos plays an important role in a wide range of practical applications.Although fully convolutional neural networks based on deep learning have made phased progress in dense prediction tasks,there are still many potential issues.Due to the complex and diverse scenarios in the images(e.g.,easily-confused objects or low-contrast backgrounds),recent methods cannot precisely locate foreground objects and restore fine boundary details.Due to the challenges brought by complex scenes and motion conditions in the videos(e.g.,occlusion,motion blur),recent methods cannot maintain temporal stability for object segmentation in videos.Relying purely on image information,recent methods are disturbed by noise and invalid information,which causes a severe degradation of performance.Based on the introduction of external modality information,this paper proves the significance of external modality enhanced and multi-modality collaboration for improving the performance of segmentation methods,and designs a novel salient object detection method and video object segmentation method.For image object segmentation,this paper designs a multiattribute collaboration network to model internal depth and edge attribute features from raw image and use stable geometric structure information from multi-attribute features to provide robust information,which achieves accurate positioning of foreground objects.For video object segmentation,this paper designs an appearance-motion collaboration network to use the optical flow information representing external motion modality to provide motion information,which adaptively fuses the effective information of multi-modality features to jointly generate robust spatiotemporal feature representations.Based on the simple,novel and effective multi-modality aggregation module,the image segmentation method and video segmentation method proposed in this paper have carried out a large number of experiments on several challenging benchmark datasets,and proved their superior performance under different evaluation indexes. |