| Forged images or videos detection has always been a research focus in the field of computer vision and information security.With the remarkable development of deep generative models such as,autoencoders and generative adversarial networks,people can easily manipulate images and videos with open-source tools and even mobile apps.The most well-known deepfake refers to manipulating multimedia content via deep learning.Given low barriers to production and high-quality forgery effects of deepfakes,these techniques are easily abused by malicious users,including spreading fake news,pornographic revenge,financial fraud and even influencing political opinion.Therefore,it is an urgent and important demand to develop effective deepfake detection methods.Although deepfake detection algorithms have received a lot of attention and research,current methods still have the following limitations: First,some local manipulations and highquality deepfakes only have subtle forgery artifacts,which limits the effectiveness of most detection methods.Second,detection algorithms are not robust in real-world applications,such as scenarios where fake images or videos uploaded to social media platforms are compressed to varying degrees.Third,existing detection models usually directly use data to train deep neural networks for classification learning.Although they have achieved impressive test results within datasets,their generalization is seriously weak across different datasets or forgery methods.To tackle these challenges and improve the effectiveness,generalization,and robustness of deepfake detection,this thesis studies deepfake detection based on forgery defects and semantic comparison and then proposes two novel methods.Aiming at improve the effectiveness of detecting different deepfake manipulation methods and the robustness in compressed scenarios,this thesis proposes a deepfake detection algorithm based on local forgery defects in the spatial and frequency domains,named Local Artifact-aware Deepfake Detection Network(LA-Net).The research of this algorithm aims to amplify the local differences implied in the spatial and frequency domains between real images and fake images,rather than specific appearance features.In this work,two modules are innovatively designed: Local Style Extraction Block(LSEB)encodes local styles in the spatial domain to extract more discriminative features;Patch-wise Frequency Spectrum Cross Attention(PFSCA)module interactively mine local forgery artifacts on amplitude and phase spectrum from the frequency domain.Through a two-branch deep learning network,the model captures subtle local forgery defects in both the spatial and frequency domains.Extensive experiments under different manipulation methods and compression scenarios demonstrate the effectiveness and robustness of the proposed deepfake detection method.To improve the generalization of deepfake detection,this thesis focuses on the manipulation of human face content and information by various deepfake forgery algorithms.Taking the forgery defect of facial semantic content as a breakthrough,this thesis proposes a novel detection algorithm based on semantic segmentation and contrastive classification.The proposed method segments facial semantic content and non-semantic content,and then performs additional contrastive learning for semantic features.Semantic segmentation enables the learning of common forgery defects among different deepfakes in semantic content.Contrastive learning further improves the generalizability of the model by forcing features of the same class to be put together while increasing the distance between different classes.Through extensive cross-dataset test experiments and comparative evaluations,we demonstrated the proposed method can significantly improve the generalization ability of current deepfake detection models based on deep convolutional neural networks. |