Research On Key Technologies For Tor Darknet Content Classification

Posted on:2024-06-16

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Zhang

Full Text:PDF

GTID:2558307067973369

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the continuous development of network technology,the content on the Internet has shown a trend of diversity and large-scale.Meanwhile,cyberspace security problems have been increasing,especially on anonymous networks like the Tor(The Onion Routers)dark web,where illegal,malicious,and illicit activities have become easier,and criminals,in turn,often use jargons to commit illegal acts,making cybercrime more difficult to be regulated.In order to secure cyberspace and solve the information overload problem,an efficient and accurate content classification technology is needed.However,it is very challenging to classify Web pages or other content in the Tor dark web.The main problems are the inefficiency of dark web data collection,the immaturity of Chinese jargons recognition,and the low performance of classifying large amount of data.Therefore,the main work of this thesis is as follows:(1)In this thesis,the Tor network is improved to address the problem of inefficient data collection on the dark network.The specific measures taken are to reduce the number of nodes passing through the Tor link in order to improve the access speed.In addition,to avoid acquiring duplicate data,this thesis uses cuckoo filters to filter the data.Based on this,a distributed crawler was designed and implemented using the Scrapy framework.Finally,the acquired data is stored in the Elasticsearch database.It is experimentally verified that using the improved crawler system significantly improves the speed of dark web data acquisition,while avoiding duplicate data acquisition,and saves 39.75%of time compared to ordinary crawlers.(2)In this thesis,for the Chinese jargons recognized task,a jargons recognition method based on SCM(Semantics Comparison Model)model is proposed,an d data preprocessing is performed by considering Chinese text characteristics,such as lexicality and proper nouns.The jargons recognizer uses a combination of different features to show significant advantages of Chinese jargons recognition,and obtains a high accuracy result of 87.66%in the experiment.(3)For text classification problem this thesis proposes an information extraction method based on LDA(Latent Dirichlet Allocation)topic model and Text-CNN(Text Convolutional Neural Network)for related research and work in the field of cybercrime.The method shows significant results in reducing noisy data,improving model accuracy and execution efficiency.Experiments prove that this scheme not only saves more than 90%of overhead,but also can improve the accuracy to 91.35%,which is further improved to 94.88%after adding jargons.The research results of this thesis have made a certain contribution to the field of darknet content classification,and provided feasible solutions and practical enlightenment for the research and work in related fields.

Keywords/Search Tags:

Dark Web, Crawler, Jargons, Text Classification

PDF Full Text Request

Related items

1	Design And Implementation Of News Website Crawler And Classification Retrieval Platform Based On Microservice
2	Design And Implementation Of Crawler Technology For Topics
3	Design And Implementation Of An Automatic Collection And Classification System For Web Text
4	Research And Design Of Machinery-Text Acquisition And Classification
5	Dark Web Classification Based On Image And Text Fusion Features
6	Research And Implementation Of Topic Crawler In The Field Of Inspection And Quarantine
7	Design And Implementation Of Classification System For Short Text User Comments
8	Research And Realization Of Chinese And English Vertical Search Engines On The Police
9	Design And Implementation Of Enterprise Portrait System Based On Text Classification
10	Research And Implementation Of Topic Web Crawler Oriented To Web Mining