Font Size: a A A

Research Of Text-Based Captchaattack Model On Skeleton Points Segmentation Algorithm

Posted on:2020-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhaoFull Text:PDF
GTID:2428330602450584Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
CAPTCHA,defined as the Turing test to distinguish between Humans and Computers,is the most widely used public automatic safety program in the world.CAPTCHAs are designed to ensure that every user accessing a network is a human rather than a computer program.This can not only ensure that network resources are used to serve human,but also avoid malicious attacks caused by programs which cause network chaos and property loss.Currently,the world's major Internet companies use different CAPTCHAs which includes text-based CAPTCHAs,voice CAPTCHAs and behavior CAPTCHAs.Among various forms of CAPTCHAs,text-based CAPTCHAs have become the most widely used form of CAPTCHAs due to its ease of design and maintenance,small code sizes,etc.Therefore,the automatic identification technology for text-based CAPTCHAs has attracted much attention.At present,the defense techniques adopted by text-based CAPTCHAs are roughly classified into five categories: complex background,noise,lines,rotation,and overlapped characters.Using overlapped characters is a relatively reliable method.The overlapped characters in CAPTCHAs usually have 10% to 50% overlap,the length of the characters is not fixed,and the potential area of the characters can not be predicted to split the CAPTCHAs.When the CAPTCHAs segmentation algorithm can not accurately segment characters,the recognition performance of the classifier is also reduced,so the CAPTCHAs with overlapped characters is widely used.The CAPTCHAs with overlapped characters of indefinite length is still a research hotspot in the field.In this article,a new attack model based on skeleton segmentation algorithm is designed for recognizing CAPTCHAs with overlapped characters.This model can effectively recognize the CAPTCHAs with overlapped characters.The main work content is divided into the following points:(1)According to the shortcomings of the traditional segmentation method and the characteristics of the overlapped CAPTCHAs,a new segmentation method is proposed.This method is different from the traditional method of "slice-type segmentation" : by breaking the internal structure of the characters,the characters are broken and recombined,so as to separate the characters in CAPTCHAs.Compared with the traditional segmentation algorithms(CFS algorithm,Three-bar algorithm,etc.),my method can quickly and effectively separate and recognize characters,and the recognition accuracy has been improved.(2)This work employs a convolution neural network(CNN)as the classifier for characters.This work analyzes and study the structure characteristics of CNNs,and we improved the output layer of the Lenet-5 network and obtained a confidence score for each character based on probability,enhancing the expressive power of the network.This model can handle the CAPTCHA with overlapped characters very well.(3)We designed a series of experiments to evaluate the validity and correctness of the CAPTCHAs attack model.The experimental data set contains more than ten websites with large traffic in the world,including more than 10,000 CAPTCHAs including Microsoft,Apple,Wikipedia and so on.The experimental results show that the model of this paper has achieved good results.In addition,it is compared with the traditional CAPTCHA attack technology and the state-of-art CAPTCHA segmentation algorithm.The results show that our attack model has certain advantages in dealing with overlapped characters.
Keywords/Search Tags:Text-based CAPTCHAs, convolutional neural networks, machine learning, overlapped characters
PDF Full Text Request
Related items