Font Size: a A A

Design And Implementation Of Phishing Web Site Detection System Based On Text Similarity

Posted on:2014-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:A LiuFull Text:PDF
GTID:2248330398972113Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network related computer technology in recent decades, people’s reliance on the internet grows increasingly. Driven by the illegal profits of phishing, phishers carry out more and more phishing attacks. Anti-phishing detection can effectively reduce the harm of phishing sites, purify the internet environment, and promote the healthy development of the internet industry.After adequate analyzing the design principles of phishing sites, we found the phishing site’s characteristic of high imitation on,official website. In consideration of this point, proposed the design of an anti-phishing detection system based on text similarity.The main work of the thesis is as follows:1) A review of the anti-phishing detection techniques based on text matching has been made. The concept and calculation method of text similarity required in the anti-phishing detection system has been introduced, made a comparison of the text similarity algorithms and chose VSM (vector space model) as the system design reference.2) Tracking a typical phishing process, after in-depth understanding of the design intention of the phisher, designed a phishing site detection model based on text similarity, built a reference text library based on classification. The system worked by comparing suspicious web page with reference text library. Set a judging threshold value for the system similarity, if the text similarity between suspicious web page and reference text is higher than the judging threshold value, the system will judge the web page as an phish, otherwise give a non-phish detection result. 3) According to the data flow model of the detection system, the entire system is divided into four stages, the URL matching stage, web page text similarity calculation stage, results output stage, system machine self-learning feedback stage. Described each stage in detail and explained design ideas. Focused on text similarity calculation algorithm for in-depth analysis and detailed design, creatively proposed two auxiliary optimization method—Page-Specific Regex and whois-Specific Regex to lower the wrong judgment rate of the system.4) A system testing with use of objective data has been made to test the system function, gave an objective evaluation of the system from three aspects:phish detection rate, wrong judgment rate and strength of detection capabilities, evaluated the system in performance indicators, gave its strengths and weaknesses as reference for further research.
Keywords/Search Tags:text similarity, phish, detection, classification
PDF Full Text Request
Related items