Font Size: a A A

Automatic Recognition On Text-based CAPTCHAs With Convolutional Neural Networks

Posted on:2018-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:L LeiFull Text:PDF
GTID:2348330521950903Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
CAPTCHA,which is called Completely Automated Public Turing test to tell Computers and Humans Apart,is a safety wall to prevent computer programs from cracking passward in a vilence way or registering a large amount of accounts and other malicious attacks.In the process of breaking a CAPTCHA,security vulnerabilities can be analysed and found.More secure CAPTCHA can be proposed to better ganrantee the security of the network.Now,the most widely used CAPTCHA is text-based CAPTCHA,which is a image containing several characters.Users have to print the corresponding chacters of the image to pass the test.Some webstes apply distortion algorithms to distort the characters for increasing the difficulty of breaking them.Some noise lines or complex background will even be added to make the schemes more difficult for the computer to recognize.Some schemes apply adhesion to make the distance between character smaller for the reason of preventing segmentation.Text-based CAPTCHAs are easy to generate and convenient for uses to recognize,so many mainstream websites design their own text-based CAPTCHAs to defend malicious network attacks.This paper choose 5 kinds from them to break,which have different styles.In a Baidu CAPTCHA,some characters are isolated and some are sticky.Some characters are hollow and some are solid.Some charaters are with some noise lines as their shadows.The fonts of them are also diverse.In a Wikepedia CAPTCHA,the characters are mostly isolated and the distortion of them is simple.In a Microsoft CAPTCHA,several characters are fully rotated.The character are hollow or solid and some are broken in the middle of them.In a Sina Weibo CAPTCHA,the character are all solid but the degree of distortion and adhsion is serious.Charaters in a Apple CAPTCHA are hollow and the background is composed of several noise line.In former processes of breaking CAPTCHAs,mostly there were thre steps : preprocess,segmentation and recognition in which a machine learning algorithm was often used.However,the algorithm needs good qualified segmentation.And the segmentation is always bad dealing with CAPTCHAs with adhsion,which makes bad recognition.The paper is following the process,which utilizes Convolutional Neural Networks in the recognition phase.Convolutional Neural Networks have strong anti-noise ability,and do not need artificially defined features like machine learning,which performs well in recognition.First of all is to do preprocessing to the images: separate the foreground and the background and do binarization.Then,sticky characters are equally cutted,and the isolated characters are segmented with color filling segmentation algorithm.Some CAPTHCHAs like Microsoft and Apple,in which the number of characters in a image is not certain,need to do number prediction with Convolutional Neural Networks before segmetation.After the segmetation,the segmented small images need to be recognized with Convolutional Neural Networks.At last,the result of the CAPTCHA image is the sequence of the image result.The method this paper proposed have successfully broken the schemes of Baidu,Wikipedia,Microsoft,Sina Weibo and Apple which separately got a success rate of 63.9%,87.1%,63.9%,50.9% and 54.4%.In the phase of predicting the number of characters in one image,the success rate of Microsoft is 87.8% and that of Apple is 95.3%.The success rates are all over 50%,which shows that Convolutional Neural Networks have a good performance and the proposed method is good at breaking text-based CAPTCHA.Thus,the paper also analyzed the features of text-based CAPTCHAs and recognition technologies.And 4 suggestions are proposed to better impove the security of text-based CAPTCHAs.
Keywords/Search Tags:text-based CAPTCHA, segmentation, recognition, convolutional neural networks
PDF Full Text Request
Related items