| A CAPTCHA,an acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart”,is a completely automated public program to determine whether the user is human or computer.In order to prevent enumeration attacks on network resources,anonymous communication tools such as Tor and JAP adopt text captcha with high security level.None of the current open source captcha recognition tools can effectively identify these captchas.Therefore,it is of great significance to study the identification method of anonymous communication tool captchas.At present,the research of text captcha can be divided into whole-based text captcha recognition and segmentation-based text captcha recognition.The whole-based text captcha recognition method based on whole needs huge training data and manual calibration work is huge.The key of text captcha recognition based on segmentation is to design segmentation algorithm with high accuracy.To solve the above problems,this paper proposes a text captcha recognition method based on sliding windows,which can realize the effective recognition of Tor and JAP captchas.The main work and contributions are as follows:First,Two noise reduction schemes are designed for Tor and JAP captchas.Both noise and characters in Tor captcha are composed of points,and the color of the points that make up the characters is different from that of other noise points.Therefore,a Tor captcha denoising scheme is designed for point noise.There are a lot of block noise in JAP captcha,but the character area is relatively large and the whole character trend is consistent,so a JAP captcha noise reduction scheme is designed for block noise.The two noise reduction schemes can remove the noise in the captcha to the maximum extent and highlight the feature of character;Second,a captcha segmentation algorithm for adaptive sliding windows is proposed to solve the problem of Tor and JAP captcha recognition with different character sizes,slant,fracture and adhesion.Firstly,an adaptive sliding windows are used for multi-step segmentation of a single character.Then,the multiple character fragments obtained by segmentation is recognized by capsule network.The greedy strategy is adopted to select the best recognition probability to determine the location of character segmentation,so as to realize the segmentation and recognition of characters.Finally,the cracking success rates of Tor and JAP captchas are 78% and 15%,respectively;Third,transfer learning method is adopted to further improve the identification accuracy of captcha.First,the EMNIST public character data set similar to the target character data set is used as the source domain to obtain the pre-training model.Then the knowledge is transferred to the target character data set recognition model by fine-tuned network parameters.Experiments show that this training mechanism can further improve the recognition accuracy of the model.Finally,the cracking success rates of Tor and JAP captchas are 87% and 20%,respectively. |