Font Size: a A A

Design And Implementation Of The Winter Olympics News Text Collection And Classification Analysis System

Posted on:2021-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:N LiuFull Text:PDF
GTID:2427330647464127Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of internet technology,the amount of network information continues to increase.Network data is mostly displayed in text types,but the distribution of text information is divergent,the content is complex,and the classification is single,so it is difficult to collect and analyze network information.In order to solve the problems of difficult data collection and rough text classification,in this paper,the Winter Olympics news text collection and classification analysis system is designed and implemented by Python language based on the focused crawler and text classification technology.The system mainly includes three functional modules which are data collection module,data classification module,and data visualization module.In the data collection module,in order to collect news text data related to the theme of the Winter Olympics,a focused crawler is customized.The obtained data provided support for the classification and analysis of the Winter Olympics information,and realized the preliminary data integration of the Winter Olympics network information.The data classification module is mainly divided into two parts which are data filtering part and text classification part.In order to achieve the screening of irrelevant information,in this paper,by introducing the local density and similarity to SNN,an adaptive SNN algorithm based on local density and similarity(AK-SNN)is proposed.To verify the performance of the AK-SNN algorithm,the comparative experiments were carried out on the UCI dataset and the Winter Olympics news text dataset.Experimental results show that AK-SNN has better robustness and prediction accuracy.In order to further classify the network text data,the extreme learning machine(ELM)is used as a classifier to achieve multi-classification of text information.The results show that ELM has achieved good classification accuracy in multi-category text classification.In the data visualization module,to visually display the collection and classification results,the web display interface is designed using the Django framework.In order to explore the potential value of the information,the data analysis was carried out on classification results,news sources,news release dates,etc.The analysis results were displayed.The design and implementation of this paper provide certain data support and technical support for the collection and analysis of network information for the 2022 Winter Olympics.At the same time,it provides a way of thinking for mining the potential information in the relevant online news texts of large-scale sports events.
Keywords/Search Tags:2022 Winter Olympics, text data, focused crawlers, text classification, data analysis, data visualization
PDF Full Text Request
Related items