Font Size: a A A

Research On Complex Character Captcha Recognition Algorithm

Posted on:2023-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:H PanFull Text:PDF
GTID:2568306833982999Subject:Engineering
Abstract/Summary:PDF Full Text Request
CAPTCHA is called completely automatic program to distinguish whether the client is human or machine.It is widely applied by major social networking sites and e-commerce platforms.To a certain extent,it ensures network security and protects the website system from malicious attacks by automatic programs.In order to verify the security of CAPTCHA,CAPTCHA recognition technology came into being.According to different operation modes,CAPTCHA can be divided into text CAPTCHA and behavior CAPTCHA.The main research object of this paper is character text CAPTCHA.At present,there are two main recognition modes of CAPTCHA: character segmentation based recognition mode and end-to-end recognition mode.Based on the analysis of the advantages and disadvantages of the traditional drip domain segmentation algorithm,this paper improves the segmentation effect by introducing the idea of drip domain optimization;For the second mode,by analyzing the characteristics of some complex character CAPTCHA that can not be effectively segmented,this paper constructs a deep learning model,fine tune the model on the basis of transfer learning,and realize the end-to-end recognition of CAPTCHA.The specific research work is divided into the following three stages:(1)According to the characteristics of various types of character text CAPTCHA collected in recent years,the CAPTCHA are divided into simple segmentation CAPTCHA,difficult segmentation CAPTCHA and indivisible CAPTCHA.For the first two separable CAPTCHA,several different segmentation algorithms are studied and analyzed in this stage,including image preprocessing technology,projection segmentation method,connected domain segmentation method and dripping algorithm;For indivisible CAPTCHA,end-to-end identification methods such as VGg network,inception network and xception network are studied and summarized in this stage.(2)In view of the incomplete stroke segmentation caused by the excessive segmentation of character overlapping stroke CAPTCHA by the traditional dripping algorithm,this stage optimizes the dripping algorithm by introducing cluster analysis,and carries out segmentation experiments with Jingdong Mall and Google CAPTCHA as samples.Firstly,the feature points of the CAPTCHA image are refined,and the candidate regions are obtained by clustering analysis;Then,the starting point of water drop is determined in this area,and the selection function is established to determine the rolling position of water drop;Finally,make the initial water drop roll along the character skeleton according to the rolling rules,and get the segmentation path of adhesive common strokes.After the CAPTCHA image segmentation is completed,the CNN model is constructed to recognize the segmented single character,and the recognition success rate of the single character is used as the evaluation index of the segmentation effect.The experimental results show that the recognition success rate of character CAPTCHA segmented by dripping algorithm based on cluster analysis is25.3% higher than that of traditional methods,reaching 87.6%.(3)In view of the problems of adhesion,character hollowness,interference color block and character background inversion in some CAPTCHA,the known segmentation methods are not ideal for the recognition results of such common character CAPTCHA similar to Baidu and Tencent.Therefore,in this stage,the preprocessing and segmentation steps of characters are removed,and CNN is used to realize end-to-end recognition.Because the traditional XCeption model does not achieve the expected ability of image feature extraction,and there is over fitting,which is easy to lead to recognition failure,this stage optimizes and improves the model by modifying the network layer structure of the network model and introducing the SE-Net module between deep separable convolution and relu.Then,the massive CAPTCHA data collected and generated are compared with segmentation based recognition methods and other basic end-to-end models.The experiments show that the recognition accuracy of the dripping algorithm based on cluster analysis proposed in the previous stage for the CAPTCHA data set selected in this stage is less than 50%,which is far lower than the end-to-end recognition algorithm.In the comparison with different end-to-end models,Compared with the traditional XCeption model,the recognition accuracy of the optimized XCeption model for the two data sets is improved by 4.13%and 3.07%,7.02% and 5.08% respectively compared with the perception Inception V3 network,and 10.41% and 9.47% respectively compared with the VGG network.At the end of this stage,by comparing the recognition accuracy of the two algorithms in this paper for the four types of CAPTCHA,we draw a conclusion: for the separable CAPTCHA,the recognition accuracy of the dripping algorithm based on cluster analysis and the end-to-end recognition method based on optimized XCeption network are similar,while the former has lower technical difficulty and data cost,so it is more applicable;For the indivisible CAPTCHA,because the accuracy of the end-to-end recognition method based on optimized XCeption network is much higher than that based on segmentation,the end-to-end recognition is more applicable.
Keywords/Search Tags:Network security, CAPTCHA, Drop-fall, CNN, Transfer learning
PDF Full Text Request
Related items