| Identity authentication is the basic method to ensure the security of users’ information.Text passwords have become the main method of identity authentication on the Internet because it is simple and easy to use.However,there are serious security risks in text passwords.On one hand,users tend to choose simple passwords for the convenience of remembering;on the other hand,the website administrators have caused the leakage of a large number of password databases because of their negligence.Therefore,the research on password security has important academic value.As the most direct method to evaluate password security,password guessing attack is one of the hot topics in password security research.How to generate as many correct passwords as possible within a limited number of times is an important issue in the research of password guessing attack.The current mainstream methods are to construct probabilistic context-free grammar models and Markov models based on statistical probabilities.The common feature of the methods is that large-scale datasets are required for training to ensure the accuracy of the estimated probability,performing well in short password guessing with sufficient datasets.Nevertheless,there are few datasets of long passwords,so the methods are less effective in long password guessing.How to improve the guessing performance of long passwords has become another problem in password guessing.In order to solve the above two problems,this article constructs a model that adapts to long and short password guessing through three aspects of research,combined with deep learning.This article first analyzes the characteristics of the password datasets of six real users from five dimensions,and mines the popular passwords,length distribution,letter distribution,password structure and the similarity of the passwords used by the same user in different datasets.The weak password behavior and differences between Chinese and English users in constructing passwords are studied as well.These characteristics all indicate that the distribution of characters in the password in uneven and the application of deep learning in password guessing is completely feasible.Next,this article designs a password data processing method and a password generation algorithm.According to the length distribution of users’ passwords,based on GPT model that only retains the Transformer decoder part,a short password guessing model is constructed.This model has a higher coverage rate than models based on statistical probability and other deep learning models.Finally,aiming at overcoming the difficulty of long password guessing,this article simulates the behavior of users constructing long passwords based on short passwords and adjusts the password data processing method and password generation algorithms to improve the model.Experiments show that the improved model has a higher coverage rate than the traditional probabilistic contextfree grammar model on long password guessing. |