| Given the target pose and appearance description,the purpose of pose-guided human image generation is to synthesize a human image that is consistent with the given target pose and the appearance description.With the continuous improvement of human image generation,it has been widely used in fields including academic research,film production,online entertainment,etc.The emergence of Generative Adversarial Network(GAN)has promoted the rapid development of human image generation,but there are still several problems to be solved,including how to deal with the local deformation from pose to human image,how to improve the details of the generated images(such as the quality of the human face),how to improve the accuracy of flow which models the complex spatial transformation between poses,etc.This paper has researched the above problems and achieved the following innovative results:(1)A spatial consistency constrained human image generation network is proposed.To solve the problem of local deformation from the pose to the target image,we propose to use the segmentation map to provide aligned regions for target image generation.We design a twostage human image generation network.In the first stage,the segmentation map is generated with the target pose as input,and provides aligned regions for the following target image generation.To improve the consistency of spatial details,we define a shape consistency loss to make the model focus on the shape of each human body part in the generated segmentation map.In the second stage,the pose and the segmentation map are concatenated as the input of the network.The model only needs to generate suitable texture in the corresponding part to implement the target image generation.To improve the accuracy of the pose in the generated target image,we define a pose consistency loss to enforce the generated image to contain a similar enough pose to the input one.The experimental results demonstrate that the images generated by the proposed method have accurate shape and delicate details,which improves the quality of the generated human images.(2)A human image generation network guided by multiple semantic features is proposed.Considering that several details(such as the contour of the body part and the human face)are blurred when the keypoint-based pose is used to generate the target image,we use multiple semantic features to improve the accuracy of the shape of the body part and the quality of human face.Meanwhile,we adopt a two-stage network architecture.The first stage is to generate the segmentation map based on the target pose.In the second stage,the keypoint information of the human body and face is used to search the face feature that best matches the new pose from the training dataset.Then,the matched face feature is combined with the pose and the segmentation map as the input of the network.The combined input provides rich face information for the model to solve the problem of blur face.The experimental results verify that the segmentation map ensures the accurate shape,and the matched face feature makes the generated face clear.The quality of the generated images is effectively improved.(3)A 3D human model guided progressive flow prediction network is proposed.The dedicated model can only generate human images of a specific person.To improve the generalization ability of the model,the methods of combining deformation estimation are used to implement general human image generation.Since the existing single-step flow prediction methods are difficult to accurately model the complex spatial transformation between poses,a progressive flow prediction network is proposed.It decomposes the transformation from the reference pose to the target one,which reduces the difficulty of flow prediction.Besides,we use the 2D projection of the 3D human model as a dense pose representation and input it to the network to improve the accuracy of the predicted flow.Finally,the target image can be generated through feature transfer of the reference based on the predicted flow.The experimental results demonstrate that the proposed method improves the accuracy of the predicted flow,which ensures the accuracy of the generated features.As a result,the texture details can be better preserved.(4)A shape-aware partial flow prediction network is proposed.The existing flow-based methods are difficult to accurately model the deformation relationship when the human poses change greatly.Hence,we propose a shape-aware partial flow prediction network.This method utilizes the local correlation of pose transformation to predict the local flow within the body parts with the same semantics.We define a shape-aware loss to make the model focus on the results of the flow in the corresponding part region.It provides an accurate source feature for the target position,which ensures the accuracy of generated features.Experimental results demonstrate that the images generated by the proposed method have accurate semantics and shape. |