| Image reconstruction and frame interpolation technique is an important research topic in computer vision and graphics,and it plays an important role in image enhancement,image compression and slow-motion video generation.Since the mainstream frame-based image reconstruction and frame interpolation algorithm relies on the assumption of constant luminance and linear motion,it cannot achieve reconstruction results well in the face of nonlinear motion or large luminance changes.In addition,the performance of the frame-based algorithm is seriously degraded when facing high-speed motion scenes.Therefore,the traditional frame-based image reconstruction and frame interpolation algorithm has limitations when dealing with complex motion,strong contrast between light and dark,or high-speed scenes.In recent years,the image reconstruction and frame interpolation technique combined with event camera is gradually becoming a hot research topic,and the related algorithm can effectively compensate the current problems of frame-based algorithm by taking advantage of the low latency and high dynamic range of event data.Based on this,this paper focuses on the following three aspects for research and improvement:(1)To address the problem that traditional convolutional neural networks cannot handle asynchronous,non-uniform spatial and temporal distribution of event data,this paper converts event data into a grid-based learning representation in a data-driven manner through a series of differential operations,allowing convolutional neural networks to learn high-dimensional features of event data and image data in an end-to-end manner,thus solving the problem of event data representation and data fusion between event cameras and traditional charge coupled device or CMOS image sensor(CCD/CIS).The experimental results show that the event grid learning representation method designed in this paper can be effectively applied to the target classification task and optical flow estimation task,and its classification accuracy reaches 92.5% in the N-MNIST dataset and 82.76%in the N-Caltech101 dataset,and the average end-point error(AEE)of optical flow estimation in the MVSEC dataset decreased by 8% on average compared to EV-Flow Net.(2)To address the problem that the traditional frame-based image reconstruction and frame interpolation algorithm cannot effectively estimate the image pixel motion in the face of nonlinear motion,high-speed scenes or complex situations with strong contrast between light and dark,this paper proposes a multimodal data fusion-based image reconstruction and frame interpolation algorithm EV-Fusion,which fuses event data with image data to realize the image in high-speed or strong contrast between light and dark scenes The proposed EV-Fusion algorithm fuses event data with image data to achieve the image generation and frame interpolation tasks in high-speed or bright contrast scenes.The experimental results show that the five-fold results generated by the proposed EV-Fusion model achieve 33.46 and 0.816 in peak signal-to-noise ratio(PSNR)and structural similarity image metric(SSIM),respectively,while the results of seven-fold results reached 32.35 and 0.815 in PSNR and SSIM,respectively,both of which are better than the mainstream frame-based image reconstruction and frame interpolation algorithms such as Super SLo Mo,DAIN,BMBC and RRIN.(3)To address the problem that traditional deep learning algorithms based on supervised learning mechanism need a large number of high frame rate datasets as real image samples and the supervised learning mechanism cannot break through the frame rate limitation,this paper proposes an image reconstruction and frame interpolation algorithm Unsupervised EV-Fusion based on cis-temporal unsupervised learning mechanism,which utilizes the event data’s feature of "full pixel motion coverage" to complete the image generation and frame interpolation tasks cyclically and selfconsistently,which is free from the shackle that the image reconstruction and frame interpolation algorithm based on the supervised learning mechanism requires a high frame rate data set and can achieve the model frame interpolation rate beyond the upper frame rate limit of the datasets.The experimental results show that even though the Unsupervised EV-Fusion model does not have real image samples with high frame rates as learning targets,the PSNR and SSIM of the five-fold results still reach 32.85 and 0.820,while the PSNR and SSIM of the seven-fold results also reach 32.20 and0.816,both of which are better than the current unsupervised learning-based image reconstruction and frame interpolation algorithms such as Unsupervised SuperSLoMo and Time Replayer. |