| With the advancement of big data technology and the emergence of large-scale models,deep learning has proven highly effective in tackling a wide range of complex problems in natural language processing.However,security concerns regarding deep learning models have become a focal point for both academia and the attack industry.Adversarial attacks have been identified as a major security threat,wherein models are misled into making incorrect predictions by introducing imperceptible alterations to the original samples.This poses a serious challenge to the performance and reliability of the models.In order to improve the security and robustness of deep learning models,researchers have proposed various adversarial defense methods.However,most of these methods are designed for specific attacks or models and lack generality,limiting their effectiveness in real-world applications.Additionally,large language models are heavily reliant on the quality and scale of data,but it is difficult to obtain high-quality data in practical scenarios,and the data distribution is unbalanced in many scenarios.Enhancing the quality of augmented samples can be a vital strategy to improve the robustness of machine learning models when dealing with low-quality original data.To address those challenge,we propose textual adversarial defense and application methods from three perspectives.In comparison to prior research,our main innovations and advantages compare are summarized as follows:(1)Existing detection techniques are designed for single attack and strongly rely on special adversarial features,resulting in insufficient generalization of detection.Therefore,we propose an adversarial detection method based on perturbation sensitivity inconsistency.The adversarial features are extracted based on the universal feature of the adversarial sample-perturbation sensitivity,and then a machine learning-based detector is trained through a vectorized representation of the features.The method is achieved by vectorizing the feature representation and training a machine learning-based detector.The proposed method achieves excellent transferability and robustness.It detects attacks effectively,even if the task,model structure,and attack method entirely differ from those learned during the training phase.The experimental results demonstrate that our method achieves a detection recall rate of 99.7% and an F1-score of 97.8% on the IMDB dataset,surpassing the performance of existing advanced adversarial detection methods.(2)We propose a certifiable adversarial defense method based on random masking and purification.It is achieved without relying on attack assumptions and target model information.The method can be added to any deep learning model without requiring any model information or retraining,and its certification has been verified through probabilistic mathematical proofs and simulation validation experiments,which conclude that the lower bounds for MPD’s robustness are determined by the mask rates μ ≥ 0.2 in the IMDB dataset and μ ≥ 0.6 in the SST-2 dataset.Compared to existing adversarial defense techniques,this method achieves a better balance between the model’s accuracy on clean samples and its robustness under adversarial attacks.(3)We propose a data augmentation method that simulates the adversarial process.Data augmentation is critical for addressing the problems of insufficient and unbalanced data distribution.Inspired by textual adversarial training,the proposed method generates augmented samples using intermediate samples from the adversarial attack process to cover the decision blind areas of the model.Compared with existing data augmentation methods,the augmented samples generated by this method have broader coverage in the decision space.Moreover,a weighted classifier combining probabilistic information is designed according to the domain characteristics,which effectively improves the model’s accuracy under the category of insufficient samples.The extensive experiments demonstrate a significant improvement in model accuracy and F1-score by 15.1% and 14.7%,respectively,when utilizing this method.Moreover,it outperforms other state-of-the-art data enhancement methods by effectively improving accuracy by 1.8%. |