Font Size: a A A

Information Searching And Tracking System Of Network Downloading File

Posted on:2022-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q T ZhangFull Text:PDF
GTID:2518306524493474Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
It is 30 year not far from Internet born,and Internet has grown to a giant.The services that Internet provides come from initial text browsing to the various functions of numerous and complicated business today.The core is downloading in all of the business Internet provides,and downloading brings conveniences with troubles,which raises many rampant online downloads of pirated files at the same time.As the network infrastructure in China develop rapidly and the network throughput has made a significant progress,the spread of pirated files has accordingly become more rapid.And the characteristics of pirated files downloading in China has emerged large transmission volume and short effective time.Based on the above factors,this paper proposes a downloading information search and download source tracking system towards pirated file downloading(ISTS-NDF,Information Searching and Tracking System of Network Downloading File).To solve the problems related to pirated files,this paper designs three subsystem for downloading pirated files or general files.In this paper:1.A series of crawlers are implemented in this subsystem,which divides the process of web page crawling into different stages and functions.The crawlers with functions such as traversal,parsing,login and replying are implemented separately,thus the problem of the information collection of deep web pages under the protection of manmachine verification is solved.2.The codes of crawler and crawler analysis web page are separated according to the various features extracted from web pages.It realizes a dynamic matching algorithm between the crawler code and the website based on average sampling,as well as the dynamic loading of crawler code,which can match the crawler code of websites within the same type and reduce the workload of manual coding.3.In order to collect a large amount of pirated file information,this paper designs a distributed crawler system based a single machine.More precisely,a weighted polling task scheduling algorithm based on the history performance data of crawlers is implemented to solve the problem of task scheduling in distributed system.It realizes the functions of de-duplication and incremental crawling for large scale web pages based on Bloon filter algorithm.Furthermore,it uses the Docker virtualization techniques to achieve automatic deployment and update of code.Moreover,it achieves the state monitoring,redundancy backup and self-healing function of nodes based on the Scrapyd technology,thus enabling an industrialized distributed crawler system and providing the potential for large-scale network information collection.4.On the basis of collecting the information of downloaded files,this paper designs and implements a tracking subsystem of downloaded files,which can confirm the validity and track the source of the three forms of file sharing: direct downloading,network disk sharing and P2 P downloading,to different degrees.It can facilitate the collection and monitoring of the evidence of pirate and provide data support for law enforcement and related researchers.5.This paper designs and implements a simple data display subsystem and a function management subsystem,based on the popular front-end and back-end technology,achieving the separation between front-end and back-end,so that the subsystem has a low functional coupling and high scalability.
Keywords/Search Tags:distributed crawler, weight polling algorithm, file download tracking, search engine
PDF Full Text Request
Related items