| CAPTCHA is also called Completely Automated Public Turing Test to Tell Computers and Humans Apart.At present,CAPTCHA is a security defensive mechanism that is used to distinguish malicious bot from legitimate users in websites.Most deployed CAPTCHAs mainly can be divided into two categories: text-based CAPTCHA and image-based CAPTCHA.Image-based CAPTCHA requires larger bandwidth and more platform limitations,thus text-based CAPTCHA is most widely deployed CAPTCHA in current websites.Most current researches still utilized segmentation-based image processing algorithm and deep learning based recognition algorithm to break text CAPTCHAs.For each CAPTCHA,such segmentation methods need to be carefully designed,the process is very tedious and are not universal between different CAPTCHAs.With the development of deep learning techniques,some problems will arise: firstly,whether there is an end-to-end recognition method without any pre-processing steps;secondly,whether this method is efficient and universal.This paper conducts in-depth research and discussion on above problems.This main work of this paper are as follows:(1)A general algorithm that can end-to-end single-step recognize CAPTCHA is proposed.The earliest simple and noiseless text captcha have been broken by researchers,since then,the form of the CAPTCHA gradually developed in a complex and difficult way.CAPTCHA image contains complex background interference,the degree of characters' overlapping and deformation is large,it is difficult to find an effective algorithm to separate characters apart.These factors will bring certain difficulty to make global identification of CAPTCHAs.Thus,in order to achieve high generality and efficiency,this paper combined the convolution neural network,and propose a whole recognition algorithm,the algorithm only need the original CAPTCHA image as a input,after training the network,we can directly get the answer as output,the pre-processing process is eliminated,which is one of the biggest advantage of the algorithm.(2)Verify the universality and effectiveness of the proposed algorithm.Firstly,verify the method on online CAPTCHAs,including Google,Baidu,Yandex and Microsoft CAPTCHA. In the meantime,we selected 11 different forms of CAPTCHA from 8 websites that are included in the top 50 most-viewed sites in the world.For real-world CAPTCHAs,we achieved success rate ranges from 79.0% to 98.3%.Secondly,we verify the method on synthetic CAPTCHAs.In this paper we designed a CAPTCHA generation system,and used this system to generate a variety of difficult CAPTCHAs,including the Chinese CAPTCHAs,style transfer CAPTCHAs,selection-based CAPTCHAs,"two-layer" CAPTCHAs,etc.In these types of CAPTCHA,our method achieved success rate ranges from 3.31% to 99.97%,the attack speed is within 0.14 seconds,which proves this algorithm is very effective in both recognition accuracy and speed.A general model is also proposed which can recognize many different CAPTCHAs simultaneously within one model.Finally,we provide the direction and suggestion for the future text-based CAPTCHA design. |