Font Size: a A A

CAPTCHA Recognition Technology Based On Deep Learning

Posted on:2022-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y D MuFull Text:PDF
GTID:2558307145462284Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
The CAPTCHA(Completely Automated Public Turing Test to Tell Computers and Humans Apart)is a challenge-response system test for distinguishing humans from automated computer programs and in general as verification code.At present,text and image CAPTCHA are the most deployed patterns in mainstream websites.This thesis proposes crack algorithms for click pattern in text and image CAPTCHAs.The main research is as follows:1.The text CAPTCHA dataset,image CAPTCHA dataset,single Chinese character image dataset and corpus dataset are constructed.The text CAPTCHA dataset includes Tencent CAPTCHA and Sina CAPTCHA obtained by crawlers and two other text CAPTCHAs generated by Python and Java,with a total of 38,000 images and resolution of168×64.The image CAPTCHA dataset includes sequential CAPTCHAs and language sequence CAPTCHAs obtained by crawlers from Gee Test,with a total of 5,000 images and resolution of 344×384.The single Chinese character image dataset is generated using Python combined with tff files with 64×64 resolution,100000 images in total,containing 5529 Chinese characters.The corpus dataset is a news dataset from People’s Daily and Sinoimex Information Opinion Platform,with a total of 100000 news text data.The above two CAPTCHA datasets and the single Chinese character image dataset are divided into training set,validation set and test set according to the ratio of 6:2:2.2.A text CAPTCHA recognition algorithm based on the combination of CRNN and Connectionist Temporal Classification(CTC)is studied and implemented.Let Dense Net be the convolutional network part and Bi LSTM be the recurrent network part with Enr CTC(Maximum Conditional Entropy Regularization And Rescale Rseudo GT for CTC)for the recognition of the above four text CAPTCHAs.Obtain Enr CTC by the optimization of the CTC loss function in use of a rescaled maximum entropy suppression algorithm;control the proportion of blank and non-blank regions in its output path through the scaling idea and suppress the existence of the maximum probability path based on the maximum entropy.The problem that CTC is prone to spike distribution is solved and,the recognition rate of the overall network is improved.The final average recognition the accuracy rate is 90.64%,which is 2.34% higher than the overall recognition the accuracy rate of CTC network.3.Designed and implemented a point-and-click image CAPTCHA recognition algorithm based on YOLO v3.The recognition step involves four parts.(1)Use the YOLO v3 network to train on the image CAPTCHA dataset for automatic detection of the position of click font position and click rule position.(2)Use Res Net to classify the single Chinese character image dataset to obtain a pre-training model,and use the pre-training model to retain the fully connected layer and freeze other layers for transfer learning on the single Chinese character image cropped from the image CAPTCHA dataset.(3)Use the maximum probability algorithm to increase the probability that the click font sequence corresponds to the click rule.(4)Use N-gram algorithm to adjust disordered Chinese characters into word order.The experimental results show that the correct rate of language sequential CAPTCHA pass in the image CAPTCHA dataset is 72.1% and the accuracy rate of sequential CAPTCHA pass is 85.3%,and the accuracy rate of sequential CAPTCHA pass using the maximum probability algorithm is increased to 90.2%.
Keywords/Search Tags:Text CAPTCHA, Image CAPTCHA, CAPTCHA recognition, Deep learning, CTC
PDF Full Text Request
Related items