Font Size: a A A

Research On Deepfake Video Detection Algorithm Based On Spatio-temporal Fusion

Posted on:2023-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z B WangFull Text:PDF
GTID:2558306845499014Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Digital video is widely used in news media,forensic identification,and other fields.However,with the development of information technology,more and more powerful digital video editing technologies have been developed and used,and more users can freely edit videos.The processing and modification of the video gives some malicious users an opportunity,and it is difficult to guarantee the authenticity and integrity of the video.At present,the widely used deepfake technology can create fake videos by exchanging the faces of different people,making them almost indistinguishable by human eyes,posing a serious threat to information security.Therefore,this paper studies the Deepfake video detection algorithm based on deep learning technology to reveal whether the video has been tampered with Deepfake technology and verify the authenticity of the video data.The main work includes:(1)A deepfake video detection algorithm based on spatiotemporal features is proposed.The algorithm designs a temporal feature extraction module and a spatial feature extraction module.The temporal feature extraction module can capture the discontinuity between deepfake video frames,and the spatial feature extraction module can extract the forgery traces in the spatial domain.Finally,a corresponding fusion module is designed to mine the complementary advantages implied by the two-way features.The experimental results show that,compared with the mainstream algorithms based on spatial features,the accuracy of the proposed model on the Celeb-DF and Face Forensics++ datasets is increased by 1.07% and 3.13%,respectively.(2)A deepfake video detection algorithm based on spatiotemporal attention is proposed.The algorithm proposes a feature extraction module and an attention-guided long-short-term-memory module to extract more effective spatiotemporal features.Firstly,the feature extraction module will extract high-level semantic features from the fullyconnected layers of the backbone network and spatial features from the mid-level convolutional layers of the backbone network,respectively,and then feed the extracted feature maps into the attention guided LSTM module to learn spatio-temporal information.The attention guided LSTM modules include a temporal attention module and a spatiotemporal attention module,which aim to focus on key artifact information in videos.The experimental results show that,compared with the popular deepfake detection algorithm,the accuracy of the proposed model on the Celeb-DF and Face Forensics++datasets is increased by 1.33% and 1.89%,respectively.(3)A deepfake video detection algorithm based on cross-modal spatiotemporal fusion is proposed.The algorithm uses a spatial convolutional neural network as the backbone network to extract visual features,and an audio network is designed to extract audio features,which is used as an attention flow to guide the network for visual modeling in the spatial dimension.In addition,an audio and video interaction module is designed to ensure the fusion of audio and video features.The experimental results show that,compared with the current advanced deepfake detection algorithm,the accuracy of the proposed model on the Fakeavceleb and DFDC datasets is increased by 3.87% and 2.96%,respectively,which further verifies that the effectiveness of proposed model is used in the Deepfake video detection task.
Keywords/Search Tags:Deepfake video detection, spatiotemporal features, audio features, cross-model fusion
PDF Full Text Request
Related items