Font Size: a A A

Research On Long-Term Video Prediction Using Taylor Disentanglement

Posted on:2023-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:T PanFull Text:PDF
GTID:2558306914481984Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of the 5G information society and the continuous innovation of technology in computer vision,videos containing rich temporal information have received more attention,and video prediction has gradually become a hot topic in current deep learning research.Video prediction uses a series of historical video frames to predict future video frames.This task is an intermediate step between raw video data and a decision-making system.It can extract potential dynamic evolution patterns from raw video data and has broad application prospects in the fields of meteorology,transportation,and robotics.Mainstream video prediction models can mostly be categorized into three frameworks:extensions of recurrent neural networks,conditioning the prediction on proxy objects,and specific architectures based on factorized prediction space.Unfortunately,existing works for video prediction fail to trade off short-term and long-term prediction performances and extract robust latent dynamics laws in video frames.In response to the above problems,the main work and innovations of this paper are summarized as follows:(1)A novel principle for feature separation,Taylor feature separation,is proposed.Taylor series is an important approximation method in physics.The Taylor feature separation is inspired by the Taylor series,which is mathematically explicable,different from explicit feature separation consistent with human intuition,such as foreground and background.This separation mode contains a mathematical prior,reducing the difficulty of feature separation and making dynamic modeling easier.Furthermore,the Taylor series applies to any differentiable function,so Taylor prior is also applicable to complex,chaotic systems.(2)Based on the above principle,a novel recurrent prediction module(TaylorCell)is proposed,which contains the Taylor prediction unit(TPU)and the memory correction unit(MCU).TPU only employs finite derivatives of the first input frame to predict the future frames for avoiding error accumulation;MCU corrects the predicted Taylor feature from TPU by distilling information of all past frames through the gating mechanism.(3)Integrating TaylorCell into the two-branch model,the paper proposes a novel video prediction model TaylorNet.Taylor series owns the characteristic of the further away from the expansion point,the greater the approximate error.Therefore,the proposed TaylorNet is primarily suitable for long-term than ultra-long-term prediction and works better on datasets with short-range spatial dependencies and stable dynamics.Moreover,TaylorNet has a small number of parameters.In three general datasets,TaylorNet reaches the state-of-the-art model in the short-term forecast and outperforms them in the long-term forecast.
Keywords/Search Tags:video prediction, feature separation, deep learning, spatiotemporal sequence prediction
PDF Full Text Request
Related items