Font Size: a A A

Research Of Web Robot Detection Technology In Airlines' Booking Servers

Posted on:2019-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:W K ChenFull Text:PDF
GTID:2382330548476290Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Web robot is a program that browses web pages automatically and grabs web data.It is the key technology in a variety of network applications.For example,search engines rely on Web robot to gather information from web pages.However,with the development of web robot technology,some malicious crawlers have caused irreparable damage to e-commerce,which occupy network bandwidth,infringe on the privacy of the user and steal commercial information.In the air booking system,web crawlers are even more harmful.Therefore,this paper presents an interactive visual system for air ticketing system,which helps users to detect crawlers by analyzing large-scale and dynamically changing IP log data.This paper uses the running mode of the offline calculation of the anti-crawler model and the real-time online detection of the crawlers.Using the historical log data to build an anti-crawler model,and then analyzing the real-time access logs by this model which can determine whether the current access is a crawler or not.At the same time,the offline models can be re-learned and updated regularly to adapt to the new type of web robots.This system takes advantage of Redis caching technology to deal with the high concurrent from other e-commerce platforms,such as air ticketing websites,to ensure the real-time detection response of crawlers.The visual interface designed in this paper provides a variety of visualization means,such as air route map,histogram,pie charts and so on,which makes it convenient for users to check the historical and real-time ticketing status and the effect of crawler detection at any time.At the same time,we use IP address aggregation and query sorting module to assist users to analyze and identify dynamic IP crawlers.The visualization module such as feature selection and IP history detail query support users to manually select training samples to update the SVM classification model.The innovation points of this paper are: Firstly,we developed a general web robots detection system based on E-build server,which can replace the present front-end anti-spider system used in airlines.Secondly,the paper considered the overall behavior of web robots which can find a large number of dynamic IP and provided a visual interface to update the classifer efficiently to maintain the long-term of detection algorithm.Specifically,we took advantage of Redis cache technology to achieve high concurrent request web for robots real-time detection.Experimental results on log data from an airline visiting E-Build server show that our system can effectively catch most crawlers,greatly reduce invalid queries,and update classification models conveniently to maintain long-term effectiveness of detection algorithms.
Keywords/Search Tags:Anti-Crawler, Booking System, Visual Analytics, Support Vector Machine, Redis
PDF Full Text Request
Related items