Font Size: a A A

Research On Text-based CAPTCHAs

Posted on:2020-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:C X TianFull Text:PDF
GTID:2428330590982226Subject:Computer technology
Abstract/Summary:PDF Full Text Request
CAPTCHAs are widely used in the login and registration of websites to enhance authentication and prevent automatically attacks from computer programs.The text-based CAPTCHAs are used by most mainstream website for its large password space and simple interaction mode.At present,in order to increase the difficulty of automatic recognition by computer programs,text-based CAPTCHAs generally using a random combination of different security features,such as complexity obstacle backgrounds,characters warp,rotate and overlap.Due to the combination of multiple security features,the recognition rate of traditional CAPTCHAs identification methods is very low or even invalid.Address this challenge,we propose a de-interference method based on Generative Adversarial Networks(GAN)to generate non-interference CAPTCHAs,and then design three identification schemes on the basis of different characteristics of the CAPTCHAs.Three identification schemes are summarized as follows:(1)For the hollow character CAPTCHAs,after using the de-interference method,the hollow characters turn into solid characters and the character spacing is stretched.Based on the observation,we propose a GAN-based segmentation identification method to segment the stretched CAPTCHAs effectively,and then the single character after segmentation is identified by the Convolutional Neural Network(CNN).(2)For the solid character CAPTCHAs,after using the de-interference method,we propose a transfer-learning-based identification method.First,we generate a mass of synthetic CAPTCHAs based on the text distribution features,and act them as training samples to train a CNN model.After that,we use several real CAPTCHAs to transfer training based on the pre-training model.In the process of transfer,the parameters of the first two layers of the pre-training model remain unchanged,and the parameters of other layers are updated by conduction.Finally,the transfer model is used to predict the real CAPTCHAs.(3)For the CAPTCHAs with solid characters and common word fragments splicing as text content,after using the de-interference method,we propose a modified-model-based identification method.First,we employ synthetic CAPTCHAs as training samples to train a CNN model,and use this recognition model with a small amount real CAPTCHAs to predict results.Then,the predicted results and real results are trained into a modified model using the Natural Language Processing(NLP)domain spelling correction method.Finally,we leverage the modified model to correct the results predicted by the identification model.In addition,for it is difficult to obtain a large number of real CAPTCHAs at a low cost.This paper designs a program to simulate these real CAPTCHAs for network training,and the training cost is far lower than other existing methods and the training effects are the same.Extensive experimental results demonstrate that the method proposed in this paper can successfully identify the CAPTCHAs catch from some famous websites,such as Microsoft,Wikipedia,Baidu,Alipay,Sina,and so on.In the best case,the recognition accuracy of our method could be 63.7% higher than traditional methods.
Keywords/Search Tags:Text-based CAPTCHAs, CAPTCHAs identification, Generative adversarial networks, Transfer Learning
PDF Full Text Request
Related items