| With the advent of 5G era and the rise of network broadcasting,there has a larger scale of image and video resource.Objective world information has become an important information source of images/videos.Nowadays,with the development of science and technology,people will produce and receive a large number of images and videos in everyday life and work.An image is worth a thousand words.In our daily communication,people also send images/videos via electronic devices to express their thoughts.In order to highlight what they want to express,people sometimes do the image/video editing,getting the more useful information intuitively.However,since the images/videos are fuzzy or noisy,and loss of the color of the images/videos is missing due to transmission loss,then users need to conduct further processing.Therefore,with the increasing demand of users,image editing has achieved great progress.At the same time,the images/videos can be edited according to users’ intention.Given a few color scribbles on local image regions,a new image is obtained,in which the color can be only propagated on local image regions.This requires the technical support of image color editing.As the traditional methods require a large number of user interactions and manual tweak of parameters,and are prone to inaccurate classification,which leads to color blooding in the editing results and creates unsatisfactory results.The existing RNN-based methods use a shallower neural network,which cannot extract the deep features.Therefore,for complex images,these methods generated the editing results with notable classification artifices.Additionally,once the testing data varied when updating user interactions,the system has to train the network from scratch,leading to increased computational costs.To solve above issues problems,this paper proposes a deep learning based edit propagation.The main works of this paper are as follows:(1)This paper proposes a deep edit propagation with embedded feature learning.The network consists of basic network,embedding module and dynamic segmentation module.The trained network model is used as a classifier to classify each pixel of the image according to user interactions.Firstly,the basic network is used to extract features from images,and then the embedding module is used to cluster the image pixels in high-dimensional embedding feature space.Then,the distance maps,which are transformed from user interactions,and embedding features are fed into dynamic segmentation head for final classification.In this method,the image pixels are first clustered in high-dimensional feature space,which is beneficial to improve the accuracy of classification,so that the target regions can be edited with less user interactions.And,it doesn’t need to be trained again when testing new images.The proposed method achieves the state-of-the-art on Pascal VOC test validation sets,with an average overlap rate of 76.86%for random sampling of 20 user clicks.(2)This paper makes further improvement on the above proposed algorithm.On the one hand,we only provide foreground clicks.Since the background and foreground are mutually competition,the distance map of the background can be obtained according to the distance maps of the foreground(image regions to be edited).On the other hand,we apply the ConvLSTM unit in the classification module,which has the memory capacity and retain the previous states.The method can further increase the accuracy of classification.The method achieves the state-of-the-art on Pascal VOC validation datasets,with an average overlap rate of 81.46%for random sampling of 20 user clicks,which achieves better performance than the existing editing propagation methods. |